Guido van Rossum | aa1e140 | 1992-04-06 14:02:49 +0000 | [diff] [blame] | 1 | \documentstyle[11pt]{article} |
Guido van Rossum | 16d6e71 | 1994-08-08 12:30:22 +0000 | [diff] [blame] | 2 | \newcommand{\Cpp}{C\protect\raisebox{.18ex}{++}} |
Guido van Rossum | 2bbb3c0 | 1992-02-11 15:52:24 +0000 | [diff] [blame] | 3 | |
| 4 | \title{ |
| 5 | Interactively Testing Remote Servers Using the Python Programming Language |
| 6 | } |
| 7 | |
| 8 | \author{ |
| 9 | Guido van Rossum \\ |
Guido van Rossum | d983cde | 1995-03-15 12:53:31 +0000 | [diff] [blame] | 10 | Dept. AA, CWI, P.O. Box 94079 \\ |
Guido van Rossum | db65a6c | 1993-11-05 17:11:16 +0000 | [diff] [blame] | 11 | 1090 GB Amsterdam, The Netherlands \\ |
Guido van Rossum | 2bbb3c0 | 1992-02-11 15:52:24 +0000 | [diff] [blame] | 12 | E-mail: {\tt guido@cwi.nl} |
| 13 | \and |
| 14 | Jelke de Boer \\ |
| 15 | HIO Enschede; P.O.Box 1326 \\ |
| 16 | 7500 BH Enschede, The Netherlands |
| 17 | } |
| 18 | |
| 19 | \begin{document} |
| 20 | |
| 21 | \maketitle |
| 22 | |
| 23 | \begin{abstract} |
| 24 | This paper describes how two tools that were developed quite |
| 25 | independently gained in power by a well-designed connection between |
| 26 | them. The tools are Python, an interpreted prototyping language, and |
| 27 | AIL, a Remote Procedure Call stub generator. The context is Amoeba, a |
| 28 | well-known distributed operating system developed jointly by the Free |
| 29 | University and CWI in Amsterdam. |
| 30 | |
| 31 | As a consequence of their integration, both tools have profited: |
| 32 | Python gained usability when used with Amoeba --- for which it was not |
| 33 | specifically developed --- and AIL users now have a powerful |
| 34 | interactive tool to test servers and to experiment with new |
| 35 | client/server interfaces.% |
| 36 | \footnote{ |
| 37 | An earlier version of this paper was presented at the Spring 1991 |
| 38 | EurOpen Conference in Troms{\o} under the title ``Linking a Stub |
| 39 | Generator (AIL) to a Prototyping Language (Python).'' |
| 40 | } |
| 41 | \end{abstract} |
| 42 | |
| 43 | \section{Introduction} |
| 44 | |
| 45 | Remote Procedure Call (RPC) interfaces, used in distributed systems |
| 46 | like Amoeba |
| 47 | \cite{Amoeba:IEEE,Amoeba:CACM}, |
| 48 | have a much more concrete character than local procedure call |
| 49 | interfaces in traditional systems. Because clients and servers may |
| 50 | run on different machines, with possibly different word size, byte |
| 51 | order, etc., much care is needed to describe interfaces exactly and to |
| 52 | implement them in such a way that they continue to work when a client |
| 53 | or server is moved to a different machine. Since machines may fail |
| 54 | independently, error handling must also be treated more carefully. |
| 55 | |
| 56 | A common approach to such problems is to use a {\em stub generator}. |
| 57 | This is a program that takes an interface description and transforms |
| 58 | it into functions that must be compiled and linked with client and |
| 59 | server applications. These functions are called by the application |
| 60 | code to take care of details of interfacing to the system's RPC layer, |
| 61 | to implement transformations between data representations of different |
| 62 | machines, to check for errors, etc. They are called `stubs' because |
| 63 | they don't actually perform the action that they are called for but |
| 64 | only relay the parameters to the server |
| 65 | \cite{RPC}. |
| 66 | |
| 67 | Amoeba's stub generator is called AIL, which stands for Amoeba |
| 68 | Interface Language |
| 69 | \cite{AIL}. |
| 70 | The first version of AIL generated only C functions, but an explicit |
| 71 | goal of AIL's design was {\em retargetability}: it should be possible |
| 72 | to add back-ends that generate stubs for different languages from the |
| 73 | same interface descriptions. Moreover, the stubs generated by |
| 74 | different back-ends must be {\em interoperable}: a client written in |
| 75 | Modula-3, say, should be able to use a server written in C, and vice |
| 76 | versa. |
| 77 | |
| 78 | This interoperability is the key to the success of the marriage |
| 79 | between AIL and Python. Python is a versatile interpreted language |
| 80 | developed by the first author. Originally intended as an alternative |
| 81 | for the kind of odd jobs that are traditionally solved by a mixture of |
| 82 | shell scripts, manually given shell commands, and an occasional ad hoc |
| 83 | C program, Python has evolved into a general interactive prototyping |
| 84 | language. It has been applied to a wide range of problems, from |
| 85 | replacements for large shell scripts to fancy graphics demos and |
| 86 | multimedia applications. |
| 87 | |
| 88 | One of Python's strengths is the ability for the user to type in some |
| 89 | code and immediately run it: no compilation or linking is necessary. |
| 90 | Interactive performance is further enhanced by Python's concise, clear |
| 91 | syntax, its very-high-level data types, and its lack of declarations |
| 92 | (which is compensated by run-time type checking). All this makes |
| 93 | programming in Python feel like a leisure trip compared to the hard |
| 94 | work involved in writing and debugging even a smallish C program. |
| 95 | |
| 96 | It should be clear by now that Python will be the ideal tool to test |
| 97 | servers and their interfaces. Especially during the development of a |
| 98 | complex server, one often needs to generate test requests on an ad hoc |
| 99 | basis, to answer questions like ``what happens if request X arrives |
| 100 | when the server is in state Y,'' to test the behavior of the server |
| 101 | with requests that touch its limitations, to check server responses to |
| 102 | all sorts of wrong requests, etc. Python's ability to immediately |
| 103 | execute `improvised' code makes it a much better tool for this |
| 104 | situation than C. |
| 105 | |
| 106 | The link to AIL extends Python with the necessary functionality to |
| 107 | connect to arbitrary servers, making the server testbed sketched above |
| 108 | a reality. Python's high-level data types, general programming |
| 109 | features, and system interface ensure that it has all the power and |
| 110 | flexibility needed for the job. |
| 111 | |
| 112 | One could go even further than this. Current distributed operating |
| 113 | systems, based on client-server interaction, all lack a good command |
| 114 | language or `shell' to give adequate access to available services. |
| 115 | Python has considerable potential for becoming such a shell. |
| 116 | |
| 117 | \subsection{Overview of this Paper} |
| 118 | |
| 119 | The rest of this paper contains three major sections and a conclusion. |
| 120 | First an overview of the Python programming language is given. Next |
| 121 | comes a short description of AIL, together with some relevant details |
| 122 | about Amoeba. Finally, the design and construction of the link |
| 123 | between Python and AIL is described in much detail. The conclusion |
| 124 | looks back at the work and points out weaknesses and strengths of |
| 125 | Python and AIL that were discovered in the process. |
| 126 | |
| 127 | \section{An Overview of Python} |
| 128 | |
| 129 | Python% |
| 130 | \footnote{ |
| 131 | Named after the funny TV show, not the nasty reptile. |
| 132 | } |
| 133 | owes much to ABC |
| 134 | \cite{ABC}, |
| 135 | a language developed at CWI as a programming language for non-expert |
| 136 | computer users. Python borrows freely from ABC's syntax and data |
| 137 | types, but adds modules, exceptions and classes, extensibility, and |
| 138 | the ability to call system functions. The concepts of modules, |
| 139 | exceptions and (to some extent) classes are influenced strongly by |
| 140 | their occurrence in Modula-3 |
| 141 | \cite{Modula-3}. |
| 142 | |
| 143 | Although Python resembles ABC in many ways, there is a a clear |
| 144 | difference in application domain. ABC is intended to be the only |
| 145 | programming language for those who use a computer as a tool, but |
| 146 | occasionally need to write a program. For this reason, ABC is not |
| 147 | just a programming language but also a programming environment, which |
| 148 | comes with an integrated syntax-directed editor and some source |
| 149 | manipulation commands. Python, on the other hand, aims to be a tool |
| 150 | for professional (system) programmers, for whom having a choice of |
| 151 | languages with different feature sets makes it possible to choose `the |
| 152 | right tool for the job.' The features added to Python make it more |
| 153 | useful than ABC in an environment where access to system functions |
| 154 | (such as file and directory manipulations) are common. They also |
| 155 | support the building of larger systems and libraries. The Python |
| 156 | implementation offers little in the way of a programming environment, |
| 157 | but is designed to integrate seamlessly with existing programming |
| 158 | environments (e.g. UNIX and Emacs). |
| 159 | |
| 160 | Perhaps the best introduction to Python is a short example. The |
| 161 | following is a complete Python program to list the contents of a UNIX |
| 162 | directory. |
| 163 | \begin{verbatim} |
| 164 | import sys, posix |
| 165 | |
| 166 | def ls(dirname): # Print sorted directory contents |
| 167 | names = posix.listdir(dirname) |
| 168 | names.sort() |
| 169 | for name in names: |
| 170 | if name[0] != '.': print name |
| 171 | |
| 172 | ls(sys.argv[1]) |
| 173 | \end{verbatim} |
| 174 | The largest part of this program, in the middle starting with {\tt |
| 175 | def}, is a function definition. It defines a function named {\tt ls} |
| 176 | with a single parameter called {\tt dirname}. (Comments in Python |
| 177 | start with `\#' and extend to the end of the line.) The function body |
| 178 | is indented: Python uses indentation for statement grouping instead of |
| 179 | braces or begin/end keywords. This is shorter to type and avoids |
| 180 | frustrating mismatches between the perception of grouping by the user |
| 181 | and the parser. Python accepts one statement per line; long |
| 182 | statements may be broken in pieces using the standard backslash |
| 183 | convention. If the body of a compound statement is a single, simple |
| 184 | statement, it may be placed on the same line as the head. |
| 185 | |
| 186 | The first statement of the function body calls the function {\tt |
| 187 | listdir} defined in the module {\tt posix}. This function returns a |
| 188 | list of strings representing the contents of the directory name passed |
| 189 | as a string argument, here the argument {\tt dirname}. If {\tt |
| 190 | dirname} were not a valid directory name, or perhaps not even a |
| 191 | string, {\tt listdir} would raise an exception and the next statement |
| 192 | would never be reached. (Exceptions can be caught in Python; see |
| 193 | later.) Assuming {\tt listdir} returns normally, its result is |
| 194 | assigned to the local variable {\tt names}. |
| 195 | |
| 196 | The second statement calls the method {\tt sort} of the variable {\tt |
| 197 | names}. This method is defined for all lists in Python and does the |
| 198 | obvious thing: the elements of the list are reordered according to |
| 199 | their natural ordering relationship. Since in our example the list |
Guido van Rossum | d983cde | 1995-03-15 12:53:31 +0000 | [diff] [blame] | 200 | contains strings, they are sorted in ascending \ASCII{} order. |
Guido van Rossum | 2bbb3c0 | 1992-02-11 15:52:24 +0000 | [diff] [blame] | 201 | |
| 202 | The last two lines of the function contain a loop that prints all |
| 203 | elements of the list whose first character isn't a period. In each |
| 204 | iteration, the {\tt for} statement assigns an element of the list to |
| 205 | the local variable {\tt name}. The {\tt print} statement is intended |
| 206 | for simple-minded output; more elaborate formatting is possible with |
| 207 | Python's string handling functions. |
| 208 | |
| 209 | The other two parts of the program are easily explained. The first |
| 210 | line is an {\tt import} statement that tells the interpreter to import |
| 211 | the modules {\tt sys} and {\tt posix}. As it happens these are both |
| 212 | built into the interpreter. Importing a module (built-in or |
| 213 | otherwise) only makes the module name available in the current scope; |
| 214 | functions and data defined in the module are accessed through the dot |
| 215 | notation as in {\tt posix.listdir}. The scope rules of Python are |
| 216 | such that the imported module name {\tt posix} is also available in |
| 217 | the function {\tt ls} (this will be discussed in more detail later). |
| 218 | |
| 219 | Finally, the last line of the program calls the {\tt ls} function with |
| 220 | a definite argument. It must be last since Python objects must be |
| 221 | defined before they can be used; in particular, the function {\tt ls} |
| 222 | must be defined before it can be called. The argument to {\tt ls} is |
| 223 | {\tt sys.argv[1]}, which happens to be the Python equivalent of {\tt |
| 224 | \$1} in a shell script or {\tt argv[1]} in a C program's {\tt main} |
| 225 | function. |
| 226 | |
| 227 | \subsection{Python Data Types} |
| 228 | |
| 229 | (This and the following subsections describe Python in quite a lot of |
| 230 | detail. If you are more interested in AIL, Amoeba and how they are |
| 231 | linked with Python, you can skip to section 3 now.) |
| 232 | |
| 233 | Python's syntax may not have big surprises (which is exactly as it |
| 234 | should be), but its data types are quite different from what is found |
| 235 | in languages like C, Ada or Modula-3. All data types in Python, even |
| 236 | integers, are `objects'. All objects participate in a common garbage |
| 237 | collection scheme (currently implemented using reference counting). |
| 238 | Assignment is cheap, independent of object size and type: only a |
| 239 | pointer to the assigned object is stored in the assigned-to variable. |
| 240 | No type checking is performed on assignment; only specific operations |
| 241 | like addition test for particular operand types. |
| 242 | |
| 243 | The basic object types in Python are numbers, strings, tuples, lists |
| 244 | and dictionaries. Some other object types are open files, functions, |
| 245 | modules, classes, and class instances; even types themselves are |
| 246 | represented as objects. Extension modules written in C can define |
| 247 | additional object types; examples are objects representing windows and |
| 248 | Amoeba capabilities. Finally, the implementation itself makes heavy |
| 249 | use of objects, and defines some private object types that aren't |
| 250 | normally visible to the user. There is no explicit pointer type in |
| 251 | Python. |
| 252 | |
| 253 | {\em Numbers}, both integers and floating point, are pretty |
| 254 | straightforward. The notation for numeric literals is the same as in |
| 255 | C, including octal and hexadecimal integers; precision is the same as |
| 256 | {\tt long} or {\tt double} in C\@. A third numeric type, `long |
| 257 | integer', written with an `L' suffix, can be used for arbitrary |
| 258 | precision calculations. All arithmetic, shifting and masking |
| 259 | operations from C are supported. |
| 260 | |
| 261 | {\em Strings} are `primitive' objects just like numbers. String |
| 262 | literals are written between single quotes, using similar escape |
| 263 | sequences as in C\@. Operations are built into the language to |
| 264 | concatenate and to replicate strings, to extract substrings, etc. |
| 265 | There is no limit to the length of the strings created by a program. |
| 266 | There is no separate character data type; strings of length one do |
| 267 | nicely. |
| 268 | |
| 269 | {\em Tuples} are a way to `pack' small amounts of heterogeneous data |
| 270 | together and carry them around as a unit. Unlike structure members in |
| 271 | C, tuple items are nameless. Packing and unpacking assignments allow |
| 272 | access to the items, for example: |
| 273 | \begin{verbatim} |
| 274 | x = 'Hi', (1, 2), 'World' # x is a 3-item tuple, |
| 275 | # its middle item is (1, 2) |
| 276 | p, q, r = x # unpack x into p, q and r |
| 277 | a, b = q # unpack q into a and b |
| 278 | \end{verbatim} |
| 279 | A combination of packing and unpacking assignment can be used as |
| 280 | parallel assignment, and is idiom for permutations, e.g.: |
| 281 | \begin{verbatim} |
| 282 | p, q = q, p # swap without temporary |
| 283 | a, b, c = b, c, a # cyclic permutation |
| 284 | \end{verbatim} |
| 285 | Tuples are also used for function argument lists if there is more than |
| 286 | one argument. A tuple object, once created, cannot be modified; but |
| 287 | it is easy enough to unpack it and create a new, modified tuple from |
| 288 | the unpacked items and assign this to the variable that held the |
| 289 | original tuple object (which will then be garbage-collected). |
| 290 | |
| 291 | {\em Lists} are array-like objects. List items may be arbitrary |
| 292 | objects and can be accessed and changed using standard subscription |
| 293 | notation. Lists support item insertion and deletion, and can |
| 294 | therefore be used as queues, stacks etc.; there is no limit to their |
| 295 | size. |
| 296 | |
| 297 | Strings, tuples and lists together are {\em sequence} types. These |
| 298 | share a common notation for generic operations on sequences such as |
| 299 | subscription, concatenation, slicing (taking subsequences) and |
| 300 | membership tests. As in C, subscripts start at 0. |
| 301 | |
| 302 | {\em Dictionaries} are `mappings' from one domain to another. The |
| 303 | basic operations on dictionaries are item insertion, extraction and |
| 304 | deletion, using subscript notation with the key as subscript. (The |
| 305 | current implementation allows only strings in the key domain, but a |
| 306 | future version of the language may remove this restriction.) |
| 307 | |
| 308 | \subsection{Statements} |
| 309 | |
| 310 | Python has various kinds of simple statements, such as assignments |
| 311 | and {\tt print} statements, and several kinds of compound statements, |
| 312 | like {\tt if} and {\tt for} statements. Formally, function |
| 313 | definitions and {\tt import} statements are also statements, and there |
| 314 | are no restrictions on the ordering of statements or their nesting: |
| 315 | {\tt import} may be used inside a function, functions may be defined |
| 316 | conditionally using an {\tt if} statement, etc. The effect of a |
| 317 | declaration-like statement takes place only when it is executed. |
| 318 | |
| 319 | All statements except assignments and expression statements begin with |
| 320 | a keyword: this makes the language easy to parse. An overview of the |
| 321 | most common statement forms in Python follows. |
| 322 | |
| 323 | An {\em assignment} has the general form |
| 324 | \vspace{\itemsep} |
| 325 | |
| 326 | \noindent |
| 327 | {\em variable $=$ variable $= ... =$ variable $=$ expression} |
| 328 | \vspace{\itemsep} |
| 329 | |
| 330 | It assigns the value of the expression to all listed variables. (As |
| 331 | shown in the section on tuples, variables and expressions can in fact |
| 332 | be comma-separated lists.) The assignment operator is not an |
| 333 | expression operator; there are no horrible things in Python like |
| 334 | \begin{verbatim} |
| 335 | while (p = p->next) { ... } |
| 336 | \end{verbatim} |
| 337 | Expression syntax is mostly straightforward and will not be explained |
| 338 | in detail here. |
| 339 | |
| 340 | An {\em expression statement} is just an expression on a line by |
| 341 | itself. This writes the value of the expression to standard output, |
| 342 | in a suitably unambiguous way, unless it is a `procedure call' (a |
| 343 | function call that returns no value). Writing the value is useful |
| 344 | when Python is used in `calculator mode', and reminds the programmer |
| 345 | not to ignore function results. |
| 346 | |
| 347 | The {\tt if} statement allows conditional execution. It has optional |
| 348 | {\tt elif} and {\tt else} parts; a construct like {\tt |
| 349 | if...elif...elif...elif...else} can be used to compensate for the |
| 350 | absence of a {\em switch} or {\em case} statement. |
| 351 | |
| 352 | Looping is done with {\tt while} and {\tt for} statements. The latter |
| 353 | (demonstrated in the `ls' example earlier) iterates over the elements |
| 354 | of a `sequence' (see the discussion of data types below). It is |
| 355 | possible to terminate a loop with a {\tt break} statement or to start |
| 356 | the next iteration with {\tt continue}. Both looping statements have |
| 357 | an optional {\tt else} clause which is executed after the loop is |
| 358 | terminated normally, but skipped when it is terminated by {\tt break}. |
| 359 | This can be handy for searches, to handle the case that the item is |
| 360 | not found. |
| 361 | |
| 362 | Python's {\em exception} mechanism is modelled after that of Modula-3. |
| 363 | Exceptions are raised by the interpreter when an illegal operation is |
| 364 | tried. It is also possible to explicitly raise an exception with the |
| 365 | {\tt raise} statement: |
| 366 | \vspace{\itemsep} |
| 367 | |
| 368 | \noindent |
| 369 | {\tt raise {\em expression}, {\em expression}} |
| 370 | \vspace{\itemsep} |
| 371 | |
| 372 | The first expression identifies which exception should be raised; |
| 373 | there are several built-in exceptions and the user may define |
| 374 | additional ones. The second, optional expression is passed to the |
| 375 | handler, e.g. as a detailed error message. |
| 376 | |
| 377 | Exceptions may be handled (caught) with the {\tt try} statement, which |
| 378 | has the following general form: |
| 379 | \vspace{\itemsep} |
| 380 | |
| 381 | \noindent |
| 382 | {\tt |
| 383 | \begin{tabular}{l} |
| 384 | try: {\em block} \\ |
| 385 | except {\em expression}, {\em variable}: {\em block} \\ |
| 386 | except {\em expression}, {\em variable}: {\em block} \\ |
| 387 | ... \\ |
| 388 | except: {\em block} |
| 389 | \end{tabular} |
| 390 | } |
| 391 | \vspace{\itemsep} |
| 392 | |
| 393 | When an exception is raised during execution of the first block, a |
| 394 | search for an exception handler starts. The first {\tt except} clause |
| 395 | whose {\em expression} matches the exception is executed. The |
| 396 | expression may specify a list of exceptions to match against. A |
| 397 | handler without an expression serves as a `catch-all'. If there is no |
| 398 | match, the search for a handler continues with outer {\tt try} |
| 399 | statements; if no match is found on the entire invocation stack, an |
| 400 | error message and stack trace are printed, and the program is |
| 401 | terminated (interactively, the interpreter returns to its main loop). |
| 402 | |
| 403 | Note that the form of the {\tt except} clauses encourages a style of |
| 404 | programming whereby only selected exceptions are caught, passing |
| 405 | unanticipated exceptions on to the caller and ultimately to the user. |
| 406 | This is preferable over a simpler `catch-all' error handling |
| 407 | mechanism, where a simplistic handler intended to catch a single type |
| 408 | of error like `file not found' can easily mask genuine programming |
| 409 | errors --- especially in a language like Python which relies strongly |
| 410 | on run-time checking and allows the catching of almost any type of |
| 411 | error. |
| 412 | |
| 413 | Other common statement forms, which we have already encountered, are |
| 414 | function definitions, {\tt import} statements and {\tt print} |
| 415 | statements. There is also a {\tt del} statement to delete one or more |
| 416 | variables, a {\tt return} statement to return from a function, and a |
| 417 | {\tt global} statement to allow assignments to global variables. |
| 418 | Finally, the {\tt pass} statement is a no-op. |
| 419 | |
| 420 | \subsection{Execution Model} |
| 421 | |
| 422 | A Python program is executed by a stack-based interpreter. |
| 423 | |
| 424 | When a function is called, a new `execution environment' for it is |
| 425 | pushed onto the stack. An execution environment contains (among other |
| 426 | data) pointers to two `symbol tables' that are used to hold variables: |
| 427 | the local and the global symbol table. The local symbol table |
| 428 | contains local variables of the current function invocation (including |
| 429 | the function arguments); the global symbol table contains variables |
| 430 | defined in the module containing the current function. |
| 431 | |
| 432 | The `global' symbol table is thus only global with respect to the |
| 433 | current function. There are no system-wide global variables; using |
| 434 | the {\tt import} statement it is easy enough to reference variables |
| 435 | that are defined in other modules. A system-wide read-only symbol |
| 436 | table is used for built-in functions and constants though. |
| 437 | |
| 438 | On assignment to a variable, by default an entry for it is made in the |
| 439 | local symbol table of the current execution environment. The {\tt |
| 440 | global} command can override this (it is not enough that a global |
| 441 | variable by the same name already exists). When a variable's value is |
| 442 | needed, it is searched first in the local symbol table, then in the |
| 443 | global one, and finally in the symbol table containing built-in |
| 444 | functions and constants. |
| 445 | |
| 446 | The term `variable' in this context refers to any name: functions and |
| 447 | imported modules are searched in exactly the same way. |
| 448 | |
| 449 | Names defined in a module's symbol table survive until the end of the |
| 450 | program. This approximates the semantics of file-static global |
| 451 | variables in C or module variables in Modula-3. A module is |
| 452 | initialized the first time it is imported, by executing the text of |
| 453 | the module as a parameterless function whose local and global symbol |
| 454 | tables are the same, so names are defined in module's symbol table. |
| 455 | (Modules implemented in C have another way to define symbols.) |
| 456 | |
| 457 | A Python main program is read from standard input or from a script |
| 458 | file passed as an argument to the interpreter. It is executed as if |
| 459 | an anonymous module was imported. Since {\tt import} statements are |
| 460 | executed like all other statements, the initialization order of the |
| 461 | modules used in a program is defined by the flow of control through |
| 462 | the program. |
| 463 | |
| 464 | The `attribute' notation {\em m.name}, where {\em m} is a module, |
| 465 | accesses the symbol {\em name} in that module's symbol table. It can |
| 466 | be assigned to as well. This is in fact a special case of the |
| 467 | construct {\em x.name} where {\em x} denotes an arbitrary object; the |
| 468 | type of {\em x} determines how this is to be interpreted, and what |
| 469 | assignment to it means. |
| 470 | |
| 471 | For instance, when {\tt a} is a list object, {\tt a.append} yields a |
| 472 | built-in `method' object which, when called, appends an item to {\tt a}. |
| 473 | (If {\tt a} and {\tt b} are distinct list objects, {\tt a.append} and |
| 474 | {\tt b.append} are distinguishable method objects.) Normally, in |
| 475 | statements like {\tt a.append(x)}, the method object {\tt a.append} is |
| 476 | called and then discarded, but this is a matter of convention. |
| 477 | |
| 478 | List attributes are read-only --- the user cannot define new list |
| 479 | methods. Some objects, like numbers and strings, have no attributes |
| 480 | at all. Like all type checking in Python, the meaning of an attribute |
| 481 | is determined at run-time --- when the parser sees {\em x.name}, it |
| 482 | has no idea of the type of {\em x}. Note that {\em x} here does not |
| 483 | have to be a variable --- it can be an arbitrary (perhaps |
| 484 | parenthesized) expression. |
| 485 | |
| 486 | Given the flexibility of the attribute notation, one is tempted to use |
| 487 | methods to replace all standard operations. Yet, Python has kept a |
| 488 | small repertoire of built-in functions like {\tt len()} and {\tt |
| 489 | abs()}. The reason is that in some cases the function notation is |
| 490 | more familiar than the method notation; just like programs would |
| 491 | become less readable if all infix operators were replaced by function |
| 492 | calls, they would become less readable if all function calls had to be |
| 493 | replaced by method calls (and vice versa!). |
| 494 | |
| 495 | The choice whether to make something a built-in function or a method |
| 496 | is a matter of taste. For arithmetic and string operations, function |
| 497 | notation is preferred, since frequently the argument to such an |
| 498 | operation is an expression using infix notation, as in {\tt abs(a+b)}; |
| 499 | this definitely looks better than {\tt (a+b).abs()}. The choice |
| 500 | between make something a built-in function or a function defined in a |
| 501 | built-in method (requiring {\tt import}) is similarly guided by |
| 502 | intuition; all in all, only functions needed by `general' programming |
| 503 | techniques are built-in functions. |
| 504 | |
| 505 | \subsection{Classes} |
| 506 | |
| 507 | Python has a class mechanism distinct from the object-orientation |
| 508 | already explained. A class in Python is not much more than a |
| 509 | collection of methods and a way to create class instances. Class |
| 510 | methods are ordinary functions whose first parameter is the class |
| 511 | instance; they are called using the method notation. |
| 512 | |
| 513 | For instance, a class can be defined as follows: |
| 514 | \begin{verbatim} |
| 515 | class Foo: |
| 516 | def meth1(self, arg): ... |
| 517 | def meth2(self): ... |
| 518 | \end{verbatim} |
| 519 | A class instance is created by |
| 520 | {\tt x = Foo()} |
| 521 | and its methods can be called thus: |
| 522 | \begin{verbatim} |
| 523 | x.meth1('Hi There!') |
| 524 | x.meth2() |
| 525 | \end{verbatim} |
| 526 | The functions used as methods are also available as attributes of the |
| 527 | class object, and the above method calls could also have been written |
| 528 | as follows: |
| 529 | \begin{verbatim} |
| 530 | Foo.meth1(x, 'Hi There!') |
| 531 | Foo.meth2(x) |
| 532 | \end{verbatim} |
| 533 | Class methods can store instance data by assigning to instance data |
| 534 | attributes, e.g.: |
| 535 | \begin{verbatim} |
| 536 | self.size = 100 |
| 537 | self.title = 'Dear John' |
| 538 | \end{verbatim} |
| 539 | Data attributes do not have to be declared; as with local variables, |
| 540 | they spring into existence when assigned to. It is a matter of |
| 541 | discretion to avoid name conflicts with method names. This facility |
| 542 | is also available to class users; instances of a method-less class can |
| 543 | be used as records with named fields. |
| 544 | |
| 545 | There is no built-in mechanism for instance initialization. Classes |
| 546 | by convention provide an {\tt init()} method which initializes the |
| 547 | instance and then returns it, so the user can write |
| 548 | \begin{verbatim} |
| 549 | x = Foo().init('Dr. Strangelove') |
| 550 | \end{verbatim} |
| 551 | |
| 552 | Any user-defined class can be used as a base class to derive other |
| 553 | classes. However, built-in types like lists cannot be used as base |
Guido van Rossum | 16d6e71 | 1994-08-08 12:30:22 +0000 | [diff] [blame] | 554 | classes. (Incidentally, the same is true in \Cpp{} and Modula-3.) A |
Guido van Rossum | 2bbb3c0 | 1992-02-11 15:52:24 +0000 | [diff] [blame] | 555 | class may override any method of its base classes. Instance methods |
| 556 | are first searched in the method list of their class, and then, |
| 557 | recursively, in the method lists of their base class. Initialization |
| 558 | methods of derived classes should explicitly call the initialization |
| 559 | methods of their base class. |
| 560 | |
| 561 | A simple form of multiple inheritance is also supported: a class can |
| 562 | have multiple base classes, but the language rules for resolving name |
| 563 | conflicts are somewhat simplistic, and consequently the feature has so |
| 564 | far found little usage. |
| 565 | |
| 566 | \subsection{The Python Library} |
| 567 | |
| 568 | Python comes with an extensive library, structured as a collection of |
| 569 | modules. A few modules are built into the interpreter: these |
| 570 | generally provide access to system libraries implemented in C such as |
| 571 | mathematical functions or operating system calls. Two built-in |
| 572 | modules provide access to internals of the interpreter and its |
| 573 | environment. Even abusing these internals will at most cause an |
| 574 | exception in the Python program; the interpreter will not dump core |
| 575 | because of errors in Python code. |
| 576 | |
| 577 | Most modules however are written in Python and distributed with the |
| 578 | interpreter; they provide general programming tools like string |
| 579 | operations and random number generators, provide more convenient |
| 580 | interfaces to some built-in modules, or provide specialized services |
| 581 | like a {\em getopt}-style command line option processor for |
| 582 | stand-alone scripts. |
| 583 | |
| 584 | There are also some modules written in Python that dig deep in the |
| 585 | internals of the interpreter; there is a module to browse the stack |
| 586 | backtrace when an unhandled exception has occurred, one to disassemble |
| 587 | the internal representation of Python code, and even an interactive |
| 588 | source code debugger which can trace Python code, set breakpoints, |
| 589 | etc. |
| 590 | |
| 591 | \subsection{Extensibility} |
| 592 | |
| 593 | It is easy to add new built-in modules written in C to the Python |
| 594 | interpreter. Extensions appear to the Python user as built-in |
| 595 | modules. Using a built-in module is no different from using a module |
| 596 | written in Python, but obviously the author of a built-in module can |
| 597 | do things that cannot be implemented purely in Python. |
| 598 | |
| 599 | In particular, built-in modules can contain Python-callable functions |
| 600 | that call functions from particular system libraries (`wrapper |
| 601 | functions'), and they can define new object types. In general, if a |
| 602 | built-in module defines a new object type, it should also provide at |
| 603 | least one function that creates such objects. Attributes of such |
| 604 | object types are also implemented in C; they can return data |
| 605 | associated with the object or methods, implemented as C functions. |
| 606 | |
| 607 | For instance, an extension was created for Amoeba: it provides wrapper |
| 608 | functions for the basic Amoeba name server functions, and defines a |
| 609 | `capability' object type, whose methods are file server operations. |
| 610 | Another extension is a built-in module called {\tt posix}; it provides |
| 611 | wrappers around post UNIX system calls. Extension modules also |
| 612 | provide access to two different windowing/graphics interfaces: STDWIN |
| 613 | \cite{STDWIN} |
| 614 | (which connects to X11 on UNIX and to the Mac Toolbox on the |
| 615 | Macintosh), and the Graphics Library (GL) for Silicon Graphics |
| 616 | machines. |
| 617 | |
| 618 | Any function in an extension module is supposed to type-check its |
| 619 | arguments; the interpreter contains a convenience function to |
| 620 | facilitate extracting C values from arguments and type-checking them |
| 621 | at the same time. Returning values is also painless, using standard |
| 622 | functions to create Python objects from C values. |
| 623 | |
| 624 | On some systems extension modules may be dynamically loaded, thus |
| 625 | avoiding the need to maintain a private copy of the Python interpreter |
| 626 | in order to use a private extension. |
| 627 | |
| 628 | \section{A Short Description of AIL and Amoeba} |
| 629 | |
| 630 | An RPC stub generator takes an interface description as input. The |
| 631 | designer of a stub generator has at least two choices for the input |
| 632 | language: use a suitably restricted version of the target language, or |
| 633 | design a new language. The first solution was chosen, for instance, |
| 634 | by the designers of Flume, the stub generator for the Topaz |
| 635 | distributed operating system built at DEC SRC |
| 636 | \cite{Flume,Evolving}. |
| 637 | |
| 638 | Flume's one and only target language is Modula-2+ (the predecessor of |
| 639 | Modula-3). Modula-2+, like Modula-N for any N, has an interface |
| 640 | syntax that is well suited as a stub generator input language: an |
| 641 | interface module declares the functions that are `exported' by a |
| 642 | module implementation, with their parameter and return types, plus the |
| 643 | types and constants used for the parameters. Therefore, the input to |
| 644 | Flume is simply a Modula-2+ interface module. But even in this ideal |
| 645 | situation, an RPC stub generator needs to know things about functions |
| 646 | that are not stated explicitly in the interface module: for instance, |
| 647 | the transfer direction of VAR parameters (IN, OUT or both) is not |
| 648 | given. Flume solves this and other problems by a mixture of |
| 649 | directives hidden in comments and a convention for the names of |
| 650 | objects. Thus, one could say that the designers of Flume really |
| 651 | created a new language, even though it looks remarkably like their |
| 652 | target language. |
| 653 | |
| 654 | \subsection{The AIL Input Language} |
| 655 | |
| 656 | Amoeba uses C as its primary programming language. C function |
| 657 | declarations (at least in `Classic' C) don't specify the types of |
| 658 | the parameters, let alone their transfer direction. Using this as |
| 659 | input for a stub generator would require almost all information for |
| 660 | the stub generator to be hidden inside comments, which would require a |
| 661 | rather contorted scanner. Therefore we decided to design the input |
| 662 | syntax for Amoeba's stub generator `from scratch'. This gave us the |
| 663 | liberty to invent proper syntax not only for the transfer direction of |
| 664 | parameters, but also for variable-length arrays. |
| 665 | |
| 666 | On the other hand we decided not to abuse our freedom, and borrowed as |
| 667 | much from C as we could. For instance, AIL runs its input through the |
| 668 | C preprocessor, so we get macros, include files and conditional |
| 669 | compilation for free. AIL's type declaration syntax is a superset of |
| 670 | C's, so the user can include C header files to use the types declared |
| 671 | there as function parameter types --- which are declared using |
Guido van Rossum | 16d6e71 | 1994-08-08 12:30:22 +0000 | [diff] [blame] | 672 | function prototypes as in \Cpp{} or Standard C\@. It should be clear by |
Guido van Rossum | 2bbb3c0 | 1992-02-11 15:52:24 +0000 | [diff] [blame] | 673 | now that AIL's lexical conventions are also identical to C's. The |
| 674 | same is true for its expression syntax. |
| 675 | |
| 676 | Where does AIL differ from C, then? Function declarations in AIL are |
| 677 | grouped in {\em classes}. Classes in AIL are mostly intended as a |
| 678 | grouping mechanism: all functions implemented by a server are grouped |
| 679 | together in a class. Inheritance is used to form new groups by adding |
| 680 | elements to existing groups; multiple inheritance is supported to join |
| 681 | groups together. Classes can also contain constant and type |
| 682 | definitions, and one form of output that AIL can generate is a header |
| 683 | file for use by C programmers who wish to use functions from a |
| 684 | particular AIL class. |
| 685 | |
| 686 | Let's have a look at some (unrealistically simple) class definitions: |
| 687 | \begin{verbatim} |
| 688 | #include <amoeba.h> /* Defines `capability', etc. */ |
| 689 | |
| 690 | class standard_ops [1000 .. 1999] { |
| 691 | /* Operations supported by most interfaces */ |
| 692 | std_info(*, out char buf[size:100], out int size); |
| 693 | std_destroy(*); |
| 694 | }; |
| 695 | \end{verbatim} |
| 696 | This defines a class called `standard\_ops' whose request codes are |
| 697 | chosen by AIL from the range 1000-1999. Request codes are small |
| 698 | integers used to identify remote operations. The author of the class |
| 699 | must specify a range from which AIL chooses, and class authors must |
| 700 | make sure they avoid conflicts, e.g. by using an `assigned number |
| 701 | administration office'. In the example, `std\_info' will be assigned |
| 702 | request code 1000 and `std\_destroy' will get code 1001. There is |
| 703 | also an option to explicitly assign request codes, for compatibility |
| 704 | with servers with manually written interfaces. |
| 705 | |
| 706 | The class `standard\_ops' defines two operations, `std\_info' and |
| 707 | `std\_destroy'. The first parameter of each operation is a star |
| 708 | (`*'); this is a placeholder for a capability that must be passed when |
| 709 | the operation is called. The description of Amoeba below explains the |
| 710 | meaning and usage of capabilities; for now, it is sufficient to know |
| 711 | that a capability is a small structure that uniquely identifies an |
| 712 | object and a server or service. |
| 713 | |
| 714 | The standard operation `std\_info' has two output parameters: a |
| 715 | variable-size character buffer (which will be filled with a short |
| 716 | descriptive string of the object to which the operation is applied) |
| 717 | and an integer giving the length of this string. The standard |
| 718 | operation `std\_destroy' has no further parameters --- it just |
| 719 | destroys the object, if the caller has the right to do so. |
| 720 | |
| 721 | The next class is called `tty': |
| 722 | \begin{verbatim} |
| 723 | class tty [2000 .. 2099] { |
| 724 | inherit standard_ops; |
| 725 | const TTY_MAXBUF = 1000; |
| 726 | tty_write(*, char buf[size:TTY_MAXBUF], int size); |
| 727 | tty_read(*, out char buf[size:TTY_MAXBUF], out int size); |
| 728 | }; |
| 729 | \end{verbatim} |
| 730 | The request codes for operations defined in this class lie in the |
| 731 | range 2000-2099; inherited operations use the request codes already |
| 732 | assigned to them. The operations defined by this class are |
| 733 | `tty\_read' and `tty\_write', which pass variable-sized data buffers |
| 734 | between client and server. Class `tty' inherits class |
| 735 | `standard\_ops', so tty objects also support the operations |
| 736 | `std\_info' and `std\_destroy'. |
| 737 | |
| 738 | Only the {\em interface} for `std\_info' and `std\_destroy' is shared |
| 739 | between tty objects and other objects whose interface inherits |
| 740 | `standard\_ops'; the implementation may differ. Even multiple |
| 741 | implementations of the `tty' interface may exist, e.g. a driver for a |
| 742 | console terminal and a terminal emulator in a window. To expand on |
| 743 | the latter example, consider: |
| 744 | \begin{verbatim} |
| 745 | class window [2100 .. 2199] { |
| 746 | inherit standard_ops; |
| 747 | win_create(*, int x, int y, int width, int height, |
| 748 | out capability win_cap); |
| 749 | win_reconfigure(*, int x, int y, int width, int height); |
| 750 | }; |
| 751 | |
| 752 | class tty_emulator [2200 .. 2299] { |
| 753 | inherit tty, window; |
| 754 | }; |
| 755 | \end{verbatim} |
| 756 | Here two new interface classes are defined. |
| 757 | Class `window' could be used for creating and manipulating windows. |
| 758 | Note that `win\_create' returns a capability for the new window. |
| 759 | This request should probably should be sent to a generic window |
| 760 | server capability, or it might create a subwindow when applied to a |
| 761 | window object. |
| 762 | |
| 763 | Class `tty\_emulator' demonstrates the essence of multiple inheritance. |
| 764 | It is presumably the interface to a window-based terminal emulator. |
| 765 | Inheritance is transitive, so `tty\_emulator' also implicitly inherits |
| 766 | `standard\_ops'. |
| 767 | In fact, it inherits it twice: once via `tty' and once via `window'. |
| 768 | Since AIL class inheritance only means interface sharing, not |
| 769 | implementation sharing, inheriting the same class multiple times is |
| 770 | never a problem and has the same effect as inheriting it once. |
| 771 | |
Guido van Rossum | 16d6e71 | 1994-08-08 12:30:22 +0000 | [diff] [blame] | 772 | Note that the power of AIL classes doesn't go as far as \Cpp{}. |
Guido van Rossum | 2bbb3c0 | 1992-02-11 15:52:24 +0000 | [diff] [blame] | 773 | AIL classes cannot have data members, and there is |
| 774 | no mechanism for a server that implements a derived class |
| 775 | to inherit the implementation of the base |
| 776 | class --- other than copying the source code. |
| 777 | The syntax for class definitions and inheritance is also different. |
| 778 | |
| 779 | \subsection{Amoeba} |
| 780 | |
| 781 | The smell of `object-orientedness' that the use of classes in AIL |
| 782 | creates matches nicely with Amoeba's object-oriented approach to |
| 783 | RPC\@. In Amoeba, almost all operating system entities (files, |
| 784 | directories, processes, devices etc.) are implemented as {\em |
| 785 | objects}. Objects are managed by {\em services} and represented by |
| 786 | {\em capabilities}. A capability gives its holder access to the |
| 787 | object it represents. Capabilities are protected cryptographically |
| 788 | against forgery and can thus be kept in user space. A capability is a |
| 789 | 128-bit binary string, subdivided as follows: |
| 790 | |
| 791 | % XXX Need a better version of this picture! |
| 792 | \begin{verbatim} |
| 793 | 48 24 8 48 Bits |
| 794 | +----------------+------------+--------+---------------+ |
| 795 | | Service | Object | Perm. | Check | |
| 796 | | port | number | bits | word | |
| 797 | +----------------+------------+--------+---------------+ |
| 798 | \end{verbatim} |
| 799 | |
| 800 | The service port is used by the RPC implementation in the Amoeba |
| 801 | kernel to locate a server implementing the service that manages the |
| 802 | object. In many cases there is a one-to-one correspondence between |
| 803 | servers and services (each service is implemented by exactly one |
| 804 | server process), but some services are replicated. For instance, |
| 805 | Amoeba's directory service, which is crucial for gaining access to most |
| 806 | other services, is implemented by two servers that listen on the same |
| 807 | port and know about exactly the same objects. |
| 808 | |
| 809 | The object number in the capability is used by the server receiving |
| 810 | the request for identifying the object to which the operation applies. |
| 811 | The permission bits specify which operations the holder of the capability |
| 812 | may apply. The last part of a capability is a 48-bit long `check |
| 813 | word', which is used to prevent forgery. The check word is computed |
| 814 | by the server based upon the permission bits and a random key per object |
| 815 | that it keeps secret. If you change the permission bits you must compute |
| 816 | the proper check word or else the server will refuse the capability. |
| 817 | Due to the size of the check word and the nature of the cryptographic |
| 818 | `one-way function' used to compute it, inverting this function is |
| 819 | impractical, so forging capabilities is impossible.% |
| 820 | \footnote{ |
| 821 | As computers become faster, inverting the one-way function becomes |
| 822 | less impractical. |
| 823 | Therefore, a next version of Amoeba will have 64-bit check words. |
| 824 | } |
| 825 | |
| 826 | A working Amoeba system is a collection of diverse servers, managing |
| 827 | files, directories, processes, devices etc. While most servers have |
| 828 | their own interface, there are some requests that make sense for some |
| 829 | or all object types. For instance, the {\em std\_info()} request, |
| 830 | which returns a short descriptive string, applies to all object types. |
| 831 | Likewise, {\em std\_destroy()} applies to files, directories and |
| 832 | processes, but not to devices. |
| 833 | |
| 834 | Similarly, different file server implementations may want to offer the |
| 835 | same interface for operations like {\em read()} and {\em write()} to |
| 836 | their clients. AIL's grouping of requests into classes is ideally |
| 837 | suited to describe this kind of interface sharing, and a class |
| 838 | hierarchy results which clearly shows the similarities between server |
| 839 | interfaces (not necessarily their implementations!). |
| 840 | |
| 841 | The base class of all classes defines the {\em std\_info()} request. |
| 842 | Most server interfaces actually inherit a derived class that also |
| 843 | defines {\em std\_destroy().} File servers inherit a class that |
| 844 | defines the common operations on files, etc. |
| 845 | |
| 846 | \subsection{How AIL Works} |
| 847 | |
| 848 | The AIL stub generator functions in three phases: |
| 849 | \begin{itemize} |
| 850 | \item |
| 851 | parsing, |
| 852 | \item |
| 853 | strategy determination, |
| 854 | \item |
| 855 | code generation. |
| 856 | \end{itemize} |
| 857 | |
| 858 | {\bf Phase one} parses the input and builds a symbol table containing |
| 859 | everything it knows about the classes and other definitions found in |
| 860 | the input. |
| 861 | |
| 862 | {\bf Phase two} determines the strategy to use for each function |
| 863 | declaration in turn and decides upon the request and reply message |
| 864 | formats. This is not a simple matter, because of various optimization |
| 865 | attempts. Amoeba's kernel interface for RPC requests takes a |
| 866 | fixed-size header and one arbitrary-size buffer. A large part of the |
| 867 | header holds the capability of the object to which the request is |
| 868 | directed, but there is some space left for a few integer parameters |
| 869 | whose interpretation is left up to the server. AIL tries to use these |
| 870 | slots for simple integer parameters, for two reasons. |
| 871 | |
| 872 | First, unlike the buffer, header fields are byte-swapped by the RPC |
| 873 | layer in the kernel if necessary, so it saves a few byte swapping |
| 874 | instructions in the user code. Second, and more important, a common |
| 875 | form of request transfers a few integers and one large buffer to or |
| 876 | from a server. The {\em read()} and {\em write()} requests of most |
| 877 | file servers have this form, for instance. If it is possible to place |
| 878 | all integer parameters in the header, the address of the buffer |
| 879 | parameter can be passed directly to the kernel RPC layer. While AIL |
| 880 | is perfectly capable of handling requests that do not fit this format, |
| 881 | the resulting code involves allocating a new buffer and copying all |
| 882 | parameters into it. It is a top priority to avoid this copying |
| 883 | (`marshalling') if at all possible, in order to maintain Amoeba's |
| 884 | famous RPC performance. |
| 885 | |
| 886 | When AIL resorts to copying parameters into a buffer, it reorders them |
| 887 | so that integers indicating the lengths of variable-size arrays are |
| 888 | placed in the buffer before the arrays they describe, since otherwise |
| 889 | decoding the request would be impossible. It also adds occasional |
| 890 | padding bytes to ensure integers are aligned properly in the buffer --- |
| 891 | this can speed up (un)marshalling. |
| 892 | |
| 893 | {\bf Phase three} is the code generator, or back-end. There are in |
| 894 | fact many different back-ends that may be called in a single run to |
| 895 | generate different types of output. The most important output types |
| 896 | are header files (for inclusion by the clients of an interface), |
| 897 | client stubs, and `server main loop' code. The latter decodes |
| 898 | incoming requests in the server. The generated code depends on the |
| 899 | programming language requested, and there are separate back-ends for |
| 900 | each supported language. |
| 901 | |
| 902 | It is important that the strategy chosen by phase two is independent |
| 903 | of the language requested for phase three --- otherwise the |
| 904 | interoperability of servers and clients written in different languages |
| 905 | would be compromised. |
| 906 | |
| 907 | \section{Linking AIL to Python} |
| 908 | |
| 909 | From the previous section it can be concluded that linking AIL to |
| 910 | Python is a matter of writing a back-end for Python. This is indeed |
| 911 | what we did. |
| 912 | |
| 913 | Considerable time went into the design of the back-end in order to |
| 914 | make the resulting RPC interface for Python fit as smoothly as |
| 915 | possible in Python's programming style. For instance, the issues of |
| 916 | parameter transfer, variable-size arrays, error handling, and call |
| 917 | syntax were all solved in a manner that favors ease of use in Python |
| 918 | rather than strict correspondence with the stubs generated for C, |
| 919 | without compromising network-level compatibility. |
| 920 | |
| 921 | \subsection{Mapping AIL Entities to Python} |
| 922 | |
| 923 | For each programming language that AIL is to support, a mapping must |
| 924 | be designed between the data types in AIL and those in that language. |
| 925 | Other aspects of the programming languages, such as differences in |
| 926 | function call semantics, must also be taken care of. |
| 927 | |
| 928 | While the mapping for C is mostly straightforward, the mapping for |
| 929 | Python requires a little thinking to get the best results for Python |
| 930 | programmers. |
| 931 | |
| 932 | \subsubsection{Parameter Transfer Direction} |
| 933 | |
| 934 | Perhaps the simplest issue is that of parameter transfer direction. |
| 935 | Parameters of functions declared in AIL are categorized as being of |
| 936 | type {\tt in}, {\tt out} or {\tt in} {\tt out} (the same distinction |
| 937 | as made in Ada). Python only has call-by-value parameter semantics; |
| 938 | functions can return multiple values as a tuple. This means that, |
| 939 | unlike the C back-end, the Python back-end cannot always generate |
| 940 | Python functions with exactly the same parameter list as the AIL |
| 941 | functions. |
| 942 | |
| 943 | Instead, the Python parameter list consists of all {\tt in} and {\tt |
| 944 | in} {\tt out} parameters, in the order in which they occur in the AIL |
| 945 | parameter list; similarly, the Python function returns a tuple |
| 946 | containing all {\tt in} {\tt out} and {\tt out} parameters. In fact |
| 947 | Python packs function parameters into a tuple as well, stressing the |
| 948 | symmetry between parameters and return value. For example, a stub |
| 949 | with this AIL parameter list: |
| 950 | \begin{verbatim} |
| 951 | (*, in int p1, in out int p2, in int p3, out int p4) |
| 952 | \end{verbatim} |
| 953 | will have the following parameter list and return values in Python: |
| 954 | \begin{verbatim} |
| 955 | (p1, p2, p3) -> (p2, p4) |
| 956 | \end{verbatim} |
| 957 | |
| 958 | \subsubsection{Variable-size Entities} |
| 959 | |
| 960 | The support for variable-size objects in AIL is strongly guided by the |
| 961 | limitations of C in this matter. Basically, AIL allows what is |
| 962 | feasible in C: functions may have variable-size arrays as parameters |
| 963 | (both input or output), provided their length is passed separately. |
| 964 | In practice this is narrowed to the following rule: for each |
| 965 | variable-size array parameter, there must be an integer parameter |
| 966 | giving its length. (An exception for null-terminated strings is |
| 967 | planned but not yet realized.) |
| 968 | |
| 969 | Variable-size arrays in AIL or C correspond to {\em sequences} in |
| 970 | Python: lists, tuples or strings. These are much easier to use than |
| 971 | their C counterparts. Given a sequence object in Python, it is always |
| 972 | possible to determine its size: the built-in function {\tt len()} |
| 973 | returns it. It would be annoying to require the caller of an RPC stub |
| 974 | with a variable-size parameter to also pass a parameter that |
| 975 | explicitly gives its size. Therefore we eliminate all parameters from |
| 976 | the Python parameter list whose value is used as the size of a |
| 977 | variable-size array. Such parameters are easily found: the array |
| 978 | bound expression contains the name of the parameter giving its size. |
| 979 | This requires the stub code to work harder (it has to recover the |
| 980 | value for size parameters from the corresponding sequence parameter), |
| 981 | but at least part of this work would otherwise be needed as well, to |
| 982 | check that the given and actual sizes match. |
| 983 | |
| 984 | Because of the symmetry in Python between the parameter list and the |
| 985 | return value of a function, the same elimination is performed on |
| 986 | return values containing variable-size arrays: integers returned |
| 987 | solely to tell the client the size of a returned array are not |
| 988 | returned explicitly to the caller in Python. |
| 989 | |
| 990 | \subsubsection{Error Handling} |
| 991 | |
| 992 | Another point where Python is really better than C is the issue of |
| 993 | error handling. It is a fact of life that everything involving RPC |
| 994 | may fail, for a variety of reasons outside the user's control: the |
| 995 | network may be disconnected, the server may be down, etc. Clients |
| 996 | must be prepared to handle such failures and recover from them, or at |
| 997 | least print an error message and die. In C this means that every |
| 998 | function returns an error status that must be checked by the caller, |
| 999 | causing programs to be cluttered with error checks --- or worse, |
| 1000 | programs that ignore errors and carry on working with garbage data. |
| 1001 | |
| 1002 | In Python, errors are generally indicated by exceptions, which can be |
| 1003 | handled out of line from the main flow of control if necessary, and |
| 1004 | cause immediate program termination (with a stack trace) if ignored. |
| 1005 | To profit from this feature, all RPC errors that may be encountered by |
| 1006 | AIL-generated stubs in Python are turned into exceptions. An extra |
| 1007 | value passed together with the exception is used to relay the error |
| 1008 | code returned by the server to the handler. Since in general RPC |
| 1009 | failures are rare, Python test programs can usually ignore exceptions |
| 1010 | --- making the program simpler --- without the risk of occasional |
Guido van Rossum | 16d6e71 | 1994-08-08 12:30:22 +0000 | [diff] [blame] | 1011 | errors going undetected. (I still remember the embarrassment of a |
Guido van Rossum | 2bbb3c0 | 1992-02-11 15:52:24 +0000 | [diff] [blame] | 1012 | hundredfold speed improvement reported, long, long, ago, about a new |
| 1013 | version of a certain program, which later had to be attributed to a |
| 1014 | benchmark that silently dumped core...) |
| 1015 | |
| 1016 | \subsubsection{Function Call Syntax} |
| 1017 | |
| 1018 | Amoeba RPC operations always need a capability parameter (this is what |
| 1019 | the `*' in the AIL function templates stands for); the service is |
| 1020 | identified by the port field of the capability. In C, the capability |
| 1021 | must always be the first parameter of the stub function, but in Python |
| 1022 | we can do better. |
| 1023 | |
| 1024 | A Python capability is an opaque object type in its own right, which |
| 1025 | is used, for instance, as parameter to and return value from Amoeba's |
| 1026 | name server functions. Python objects can have methods, so it is |
| 1027 | convenient to make all AIL-generated stubs methods of capabilities |
| 1028 | instead of just functions. Therefore, instead of writing |
| 1029 | \begin{verbatim} |
| 1030 | some_stub(cap, other_parameters) |
| 1031 | \end{verbatim} |
| 1032 | as in C, Python programmers can write |
| 1033 | \begin{verbatim} |
| 1034 | cap.some_stub(other_parameters) |
| 1035 | \end{verbatim} |
| 1036 | This is better because it reduces name conflicts: in Python, no |
| 1037 | confusion is possible between a stub and a local or global variable or |
| 1038 | user-defined function with the same name. |
| 1039 | |
| 1040 | \subsubsection{Example} |
| 1041 | |
| 1042 | All the preceding principles can be seen at work in the following |
| 1043 | example. Suppose a function is declared in AIL as follows: |
| 1044 | \begin{verbatim} |
| 1045 | some_stub(*, in char buf[size:1000], in int size, |
| 1046 | out int n_done, out int status); |
| 1047 | \end{verbatim} |
| 1048 | In C it might be called by the following code (including declarations, |
| 1049 | for clarity, but not initializations): |
| 1050 | \begin{verbatim} |
| 1051 | int err, n_done, status; |
| 1052 | capability cap; |
| 1053 | char buf[500]; |
| 1054 | ... |
| 1055 | err = some_stub(&cap, buf, sizeof buf, &n_done, &status); |
| 1056 | if (err != 0) return err; |
| 1057 | printf("%d done; status = %d\n", n_done, status); |
| 1058 | \end{verbatim} |
| 1059 | Equivalent code in Python might be the following: |
| 1060 | \begin{verbatim} |
| 1061 | cap = ... |
| 1062 | buf = ... |
| 1063 | n_done, status = cap.some_stub(buf) |
| 1064 | print n_done, 'done;', 'status =', status |
| 1065 | \end{verbatim} |
| 1066 | No explicit error check is required in Python: if the RPC fails, an |
| 1067 | exception is raised so the {\tt print} statement is never reached. |
| 1068 | |
| 1069 | \subsection{The Implementation} |
| 1070 | |
| 1071 | More or less orthogonal to the issue of how to map AIL operations to |
| 1072 | the Python language is the question of how they should be implemented. |
| 1073 | |
| 1074 | In principle it would be possible to use the same strategy that is |
| 1075 | used for C: add an interface to Amoeba's low-level RPC primitives to |
| 1076 | Python and generate Python code to marshal parameters into and out of |
| 1077 | a buffer. However, Python's high-level data types are not well suited |
| 1078 | for marshalling: byte-level operations are clumsy and expensive, with |
| 1079 | the result that marshalling a single byte of data can take several |
| 1080 | Python statements. This would mean that a large amount of code would |
| 1081 | be needed to implement a stub, which would cost a lot of time to parse |
| 1082 | and take up a lot of space in `compiled' form (as parse tree or pseudo |
| 1083 | code). Execution of the marshalling code would be sluggish as well. |
| 1084 | |
| 1085 | We therefore chose an alternate approach, writing the marshalling in |
| 1086 | C, which is efficient at such byte-level operations. While it is easy |
| 1087 | enough to generate C code that can be linked with the Python |
| 1088 | interpreter, it would obviously not stimulate the use of Python for |
| 1089 | server testing if each change to an interface required relinking the |
| 1090 | interpreter (dynamic loading of C code is not yet available on |
| 1091 | Amoeba). This is circumvented by the following solution: the |
| 1092 | marshalling is handled by a simple {\em virtual machine}, and AIL |
| 1093 | generates instructions for this machine. An interpreter for the |
| 1094 | machine is linked into the Python interpreter and reads its |
| 1095 | instructions from a file written by AIL. |
| 1096 | |
| 1097 | The machine language for our virtual machine is dubbed {\em Stubcode}. |
| 1098 | Stubcode is a super-specialized language. There are two sets of of |
| 1099 | about a dozen instructions each: one set marshals Python objects |
| 1100 | representing parameters into a buffer, the other set (similar but not |
| 1101 | quite symmetric) unmarshals results from a buffer into Python objects. |
| 1102 | The Stubcode interpreter uses a stack to hold Python intermediate |
| 1103 | results. Other state elements are an Amoeba header and buffer, a |
| 1104 | pointer indicating the current position in the buffer, and of course a |
| 1105 | program counter. Besides (un)marshalling, the virtual machine must |
| 1106 | also implement type checking, and raise a Python exception when a |
| 1107 | parameter does not have the expected type. |
| 1108 | |
| 1109 | The Stubcode interpreter marshals Python data types very efficiently, |
| 1110 | since each instruction can marshal a large amount of data. For |
| 1111 | instance, a whole Python string is marshalled by a single Stubcode |
| 1112 | instruction, which (after some checking) executes the most efficient |
| 1113 | byte-copying loop possible --- it calls {\tt memcpy()}. |
| 1114 | |
| 1115 | |
| 1116 | Construction details of the Stubcode interpreter are straightforward. |
| 1117 | Most complications are caused by the peculiarities of AIL's strategy |
| 1118 | module and Python's type system. By far the most complex single |
| 1119 | instruction is the `loop' instruction, which is used to marshal |
| 1120 | arrays. |
| 1121 | |
| 1122 | As an example, here is the complete Stubcode program (with spaces and |
| 1123 | comments added for clarity) generated for the function {\tt |
| 1124 | some\_stub()} of the example above. The stack contains pointers to |
| 1125 | Python objects, and its initial contents is the parameter to the |
| 1126 | function, the string {\tt buf}. The final stack contents will be the |
| 1127 | function return value, the tuple {\tt (n\_done, status)}. The name |
| 1128 | {\tt header} refers to the fixed size Amoeba RPC header structure. |
| 1129 | \vspace{1em} |
| 1130 | |
| 1131 | {\tt |
| 1132 | \begin{tabular}{l l l} |
| 1133 | BufSize & 1000 & {\em Allocate RPC buffer of 1000 bytes} \\ |
| 1134 | Dup & 1 & {\em Duplicate stack top} \\ |
| 1135 | StringS & & {\em Replace stack top by its string size} \\ |
| 1136 | PutI & h\_extra int32 & {\em Store top element in }header.h\_extra \\ |
| 1137 | TStringSlt & 1000 & {\em Assert string size less than 1000} \\ |
| 1138 | PutVS & & {\em Marshal variable-size string} \\ |
| 1139 | & & \\ |
| 1140 | Trans & 1234 & {\em Execute the RPC (request code 1234)} \\ |
| 1141 | & & \\ |
| 1142 | GetI & h\_extra int32 & {\em Push integer from} header.h\_extra \\ |
| 1143 | GetI & h\_size int32 & {\em Push integer from} header.h\_size \\ |
| 1144 | Pack & 2 & {\em Pack top 2 elements into a tuple} \\ |
| 1145 | \end{tabular} |
| 1146 | } |
| 1147 | \vspace{1em} |
| 1148 | |
| 1149 | As much work as possible is done by the Python back-end in AIL, rather |
| 1150 | than in the Stubcode interpreter, to make the latter both simple and |
| 1151 | fast. For instance, the decision to eliminate an array size parameter |
| 1152 | from the Python parameter list is taken by AIL, and Stubcode |
| 1153 | instructions are generated to recover the size from the actual |
| 1154 | parameter and to marshal it properly. Similarly, there is a special |
| 1155 | alignment instruction (not used in the example) to meet alignment |
| 1156 | requirements. |
| 1157 | |
| 1158 | Communication between AIL and the Stubcode generator is via the file |
| 1159 | system. For each stub function, AIL creates a file in its output |
| 1160 | directory, named after the stub with a specific suffix. This file |
| 1161 | contains a machine-readable version of the Stubcode program for the |
| 1162 | stub. The Python user can specify a search path containing |
| 1163 | directories which the interpreter searches for a Stubcode file the |
| 1164 | first time the definition for a particular stub is needed. |
| 1165 | |
| 1166 | The transformations on the parameter list and data types needed to map |
| 1167 | AIL data types to Python data types make it necessary to help the |
| 1168 | Python programmer a bit in figuring out the parameters to a call. |
| 1169 | Although in most cases the rules are simple enough, it is sometimes |
| 1170 | hard to figure out exactly what the parameter and return values of a |
| 1171 | particular stub are. There are two sources of help in this case: |
| 1172 | first, the exception contains enough information so that the user can |
| 1173 | figure what type was expected; second, AIL's Python back-end |
| 1174 | optionally generates a human-readable `interface specification' file. |
| 1175 | |
| 1176 | \section{Conclusion} |
| 1177 | |
| 1178 | We have succeeded in creating a useful extension to Python that |
| 1179 | enables Amoeba server writers to test and experiment with their server |
| 1180 | in a much more interactive manner. We hope that this facility will |
| 1181 | add to the popularity of AIL amongst Amoeba programmers. |
| 1182 | |
| 1183 | Python's extensibility was proven convincingly by the exercise |
| 1184 | (performed by the second author) of adding the Stubcode interpreter to |
| 1185 | Python. Standard data abstraction techniques are used to insulate |
| 1186 | extension modules from details of the rest of the Python interpreter. |
| 1187 | In the case of the Stubcode interpreter this worked well enough that |
| 1188 | it survived a major overhaul of the main Python interpreter virtually |
| 1189 | unchanged. |
| 1190 | |
| 1191 | On the other hand, adding a new back-end to AIL turned out to be quite |
| 1192 | a bit of work. One problem, specific to Python, was to be expected: |
| 1193 | Python's variable-size data types differ considerably from the |
| 1194 | C-derived data model that AIL favors. Two additional problems we |
| 1195 | encountered were the complexity of the interface between AIL's second |
| 1196 | and third phases, and a number of remaining bugs in the second phase |
| 1197 | that surfaced when the implementation of the Python back-end was |
| 1198 | tested. The bugs have been tracked down and fixed, but nothing |
| 1199 | has been done about the complexity of the interface. |
| 1200 | |
| 1201 | \subsection{Future Plans} |
| 1202 | |
| 1203 | AIL's C back-end generates server main loop code as well as client |
| 1204 | stubs. The Python back-end currently only generates client stubs, so |
| 1205 | it is not yet possible to write servers in Python. While it is |
| 1206 | clearly more important to be able to use Python as a client than as a |
| 1207 | server, the ability to write server prototypes in Python would be a |
| 1208 | valuable addition: it allows server designers to experiment with |
| 1209 | interfaces in a much earlier stage of the design, with a much smaller |
| 1210 | programming effort. This makes it possible to concentrate on concepts |
| 1211 | first, before worrying about efficient implementation. |
| 1212 | |
| 1213 | The unmarshalling done in the server is almost symmetric with the |
| 1214 | marshalling in the client, and vice versa, so relative small |
| 1215 | extensions to the Stubcode virtual machine will allow its use in a |
| 1216 | server main loop. We hope to find the time to add this feature to a |
| 1217 | future version of Python. |
| 1218 | |
| 1219 | \section{Availability} |
| 1220 | |
| 1221 | The Python source distribution is available to Internet users by |
| 1222 | anonymous ftp to site {\tt ftp.cwi.nl} [IP address 192.16.184.180] |
| 1223 | from directory {\tt /pub}, file name {\tt python*.tar.Z} (where the |
| 1224 | {\tt *} stands for a version number). This is a compressed UNIX tar |
| 1225 | file containing the C source and \LaTeX documentation for the Python |
| 1226 | interpreter. It includes the Python library modules and the {\em |
| 1227 | Stubcode} interpreter, as well as many example Python programs. Total |
| 1228 | disk space occupied by the distribution is about 3 Mb; compilation |
| 1229 | requires 1-3 Mb depending on the configuration built, the compile |
| 1230 | options, etc. |
| 1231 | |
| 1232 | \bibliographystyle{plain} |
| 1233 | |
| 1234 | \bibliography{quabib} |
| 1235 | |
| 1236 | \end{document} |