Blame - Doc/qua.tex - platform/external/python/cpython2

blob: 88f5778fa5a12ccdeeb1a993987f91c63901f428 [file] [log] [blame]

Guido van Rossum	aa1e140	1992-04-06 14:02:49 +0000	[diff] [blame]	1	\documentstyle[11pt]{article}
Guido van Rossum	16d6e71	1994-08-08 12:30:22 +0000	[diff] [blame]	2	\newcommand{\Cpp}{C\protect\raisebox{.18ex}{++}}
Guido van Rossum	2bbb3c0	1992-02-11 15:52:24 +0000	[diff] [blame]	3
				4	\title{
				5	Interactively Testing Remote Servers Using the Python Programming Language
				6	}
				7
				8	\author{
				9	Guido van Rossum \\
Guido van Rossum	d983cde	1995-03-15 12:53:31 +0000	[diff] [blame]	10	Dept. AA, CWI, P.O. Box 94079 \\
Guido van Rossum	db65a6c	1993-11-05 17:11:16 +0000	[diff] [blame]	11	1090 GB Amsterdam, The Netherlands \\
Guido van Rossum	2bbb3c0	1992-02-11 15:52:24 +0000	[diff] [blame]	12	E-mail: {\tt guido@cwi.nl}
				13	\and
				14	Jelke de Boer \\
				15	HIO Enschede; P.O.Box 1326 \\
				16	7500 BH Enschede, The Netherlands
				17	}
				18
				19	\begin{document}
				20
				21	\maketitle
				22
				23	\begin{abstract}
				24	This paper describes how two tools that were developed quite
				25	independently gained in power by a well-designed connection between
				26	them. The tools are Python, an interpreted prototyping language, and
				27	AIL, a Remote Procedure Call stub generator. The context is Amoeba, a
				28	well-known distributed operating system developed jointly by the Free
				29	University and CWI in Amsterdam.
				30
				31	As a consequence of their integration, both tools have profited:
				32	Python gained usability when used with Amoeba --- for which it was not
				33	specifically developed --- and AIL users now have a powerful
				34	interactive tool to test servers and to experiment with new
				35	client/server interfaces.%
				36	\footnote{
				37	An earlier version of this paper was presented at the Spring 1991
				38	EurOpen Conference in Troms{\o} under the title ``Linking a Stub
				39	Generator (AIL) to a Prototyping Language (Python).''
				40	}
				41	\end{abstract}
				42
				43	\section{Introduction}
				44
				45	Remote Procedure Call (RPC) interfaces, used in distributed systems
				46	like Amoeba
				47	\cite{Amoeba:IEEE,Amoeba:CACM},
				48	have a much more concrete character than local procedure call
				49	interfaces in traditional systems. Because clients and servers may
				50	run on different machines, with possibly different word size, byte
				51	order, etc., much care is needed to describe interfaces exactly and to
				52	implement them in such a way that they continue to work when a client
				53	or server is moved to a different machine. Since machines may fail
				54	independently, error handling must also be treated more carefully.
				55
				56	A common approach to such problems is to use a {\em stub generator}.
				57	This is a program that takes an interface description and transforms
				58	it into functions that must be compiled and linked with client and
				59	server applications. These functions are called by the application
				60	code to take care of details of interfacing to the system's RPC layer,
				61	to implement transformations between data representations of different
				62	machines, to check for errors, etc. They are called `stubs' because
				63	they don't actually perform the action that they are called for but
				64	only relay the parameters to the server
				65	\cite{RPC}.
				66
				67	Amoeba's stub generator is called AIL, which stands for Amoeba
				68	Interface Language
				69	\cite{AIL}.
				70	The first version of AIL generated only C functions, but an explicit
				71	goal of AIL's design was {\em retargetability}: it should be possible
				72	to add back-ends that generate stubs for different languages from the
				73	same interface descriptions. Moreover, the stubs generated by
				74	different back-ends must be {\em interoperable}: a client written in
				75	Modula-3, say, should be able to use a server written in C, and vice
				76	versa.
				77
				78	This interoperability is the key to the success of the marriage
				79	between AIL and Python. Python is a versatile interpreted language
				80	developed by the first author. Originally intended as an alternative
				81	for the kind of odd jobs that are traditionally solved by a mixture of
				82	shell scripts, manually given shell commands, and an occasional ad hoc
				83	C program, Python has evolved into a general interactive prototyping
				84	language. It has been applied to a wide range of problems, from
				85	replacements for large shell scripts to fancy graphics demos and
				86	multimedia applications.
				87
				88	One of Python's strengths is the ability for the user to type in some
				89	code and immediately run it: no compilation or linking is necessary.
				90	Interactive performance is further enhanced by Python's concise, clear
				91	syntax, its very-high-level data types, and its lack of declarations
				92	(which is compensated by run-time type checking). All this makes
				93	programming in Python feel like a leisure trip compared to the hard
				94	work involved in writing and debugging even a smallish C program.
				95
				96	It should be clear by now that Python will be the ideal tool to test
				97	servers and their interfaces. Especially during the development of a
				98	complex server, one often needs to generate test requests on an ad hoc
				99	basis, to answer questions like ``what happens if request X arrives
				100	when the server is in state Y,'' to test the behavior of the server
				101	with requests that touch its limitations, to check server responses to
				102	all sorts of wrong requests, etc. Python's ability to immediately
				103	execute `improvised' code makes it a much better tool for this
				104	situation than C.
				105
				106	The link to AIL extends Python with the necessary functionality to
				107	connect to arbitrary servers, making the server testbed sketched above
				108	a reality. Python's high-level data types, general programming
				109	features, and system interface ensure that it has all the power and
				110	flexibility needed for the job.
				111
				112	One could go even further than this. Current distributed operating
				113	systems, based on client-server interaction, all lack a good command
				114	language or `shell' to give adequate access to available services.
				115	Python has considerable potential for becoming such a shell.
				116
				117	\subsection{Overview of this Paper}
				118
				119	The rest of this paper contains three major sections and a conclusion.
				120	First an overview of the Python programming language is given. Next
				121	comes a short description of AIL, together with some relevant details
				122	about Amoeba. Finally, the design and construction of the link
				123	between Python and AIL is described in much detail. The conclusion
				124	looks back at the work and points out weaknesses and strengths of
				125	Python and AIL that were discovered in the process.
				126
				127	\section{An Overview of Python}
				128
				129	Python%
				130	\footnote{
				131	Named after the funny TV show, not the nasty reptile.
				132	}
				133	owes much to ABC
				134	\cite{ABC},
				135	a language developed at CWI as a programming language for non-expert
				136	computer users. Python borrows freely from ABC's syntax and data
				137	types, but adds modules, exceptions and classes, extensibility, and
				138	the ability to call system functions. The concepts of modules,
				139	exceptions and (to some extent) classes are influenced strongly by
				140	their occurrence in Modula-3
				141	\cite{Modula-3}.
				142
				143	Although Python resembles ABC in many ways, there is a a clear
				144	difference in application domain. ABC is intended to be the only
				145	programming language for those who use a computer as a tool, but
				146	occasionally need to write a program. For this reason, ABC is not
				147	just a programming language but also a programming environment, which
				148	comes with an integrated syntax-directed editor and some source
				149	manipulation commands. Python, on the other hand, aims to be a tool
				150	for professional (system) programmers, for whom having a choice of
				151	languages with different feature sets makes it possible to choose `the
				152	right tool for the job.' The features added to Python make it more
				153	useful than ABC in an environment where access to system functions
				154	(such as file and directory manipulations) are common. They also
				155	support the building of larger systems and libraries. The Python
				156	implementation offers little in the way of a programming environment,
				157	but is designed to integrate seamlessly with existing programming
				158	environments (e.g. UNIX and Emacs).
				159
				160	Perhaps the best introduction to Python is a short example. The
				161	following is a complete Python program to list the contents of a UNIX
				162	directory.
				163	\begin{verbatim}
				164	import sys, posix
				165
				166	def ls(dirname): # Print sorted directory contents
				167	names = posix.listdir(dirname)
				168	names.sort()
				169	for name in names:
				170	if name[0] != '.': print name
				171
				172	ls(sys.argv[1])
				173	\end{verbatim}
				174	The largest part of this program, in the middle starting with {\tt
				175	def}, is a function definition. It defines a function named {\tt ls}
				176	with a single parameter called {\tt dirname}. (Comments in Python
				177	start with `\#' and extend to the end of the line.) The function body
				178	is indented: Python uses indentation for statement grouping instead of
				179	braces or begin/end keywords. This is shorter to type and avoids
				180	frustrating mismatches between the perception of grouping by the user
				181	and the parser. Python accepts one statement per line; long
				182	statements may be broken in pieces using the standard backslash
				183	convention. If the body of a compound statement is a single, simple
				184	statement, it may be placed on the same line as the head.
				185
				186	The first statement of the function body calls the function {\tt
				187	listdir} defined in the module {\tt posix}. This function returns a
				188	list of strings representing the contents of the directory name passed
				189	as a string argument, here the argument {\tt dirname}. If {\tt
				190	dirname} were not a valid directory name, or perhaps not even a
				191	string, {\tt listdir} would raise an exception and the next statement
				192	would never be reached. (Exceptions can be caught in Python; see
				193	later.) Assuming {\tt listdir} returns normally, its result is
				194	assigned to the local variable {\tt names}.
				195
				196	The second statement calls the method {\tt sort} of the variable {\tt
				197	names}. This method is defined for all lists in Python and does the
				198	obvious thing: the elements of the list are reordered according to
				199	their natural ordering relationship. Since in our example the list
Guido van Rossum	d983cde	1995-03-15 12:53:31 +0000	[diff] [blame]	200	contains strings, they are sorted in ascending \ASCII{} order.
Guido van Rossum	2bbb3c0	1992-02-11 15:52:24 +0000	[diff] [blame]	201
				202	The last two lines of the function contain a loop that prints all
				203	elements of the list whose first character isn't a period. In each
				204	iteration, the {\tt for} statement assigns an element of the list to
				205	the local variable {\tt name}. The {\tt print} statement is intended
				206	for simple-minded output; more elaborate formatting is possible with
				207	Python's string handling functions.
				208
				209	The other two parts of the program are easily explained. The first
				210	line is an {\tt import} statement that tells the interpreter to import
				211	the modules {\tt sys} and {\tt posix}. As it happens these are both
				212	built into the interpreter. Importing a module (built-in or
				213	otherwise) only makes the module name available in the current scope;
				214	functions and data defined in the module are accessed through the dot
				215	notation as in {\tt posix.listdir}. The scope rules of Python are
				216	such that the imported module name {\tt posix} is also available in
				217	the function {\tt ls} (this will be discussed in more detail later).
				218
				219	Finally, the last line of the program calls the {\tt ls} function with
				220	a definite argument. It must be last since Python objects must be
				221	defined before they can be used; in particular, the function {\tt ls}
				222	must be defined before it can be called. The argument to {\tt ls} is
				223	{\tt sys.argv[1]}, which happens to be the Python equivalent of {\tt
				224	\$1} in a shell script or {\tt argv[1]} in a C program's {\tt main}
				225	function.
				226
				227	\subsection{Python Data Types}
				228
				229	(This and the following subsections describe Python in quite a lot of
				230	detail. If you are more interested in AIL, Amoeba and how they are
				231	linked with Python, you can skip to section 3 now.)
				232
				233	Python's syntax may not have big surprises (which is exactly as it
				234	should be), but its data types are quite different from what is found
				235	in languages like C, Ada or Modula-3. All data types in Python, even
				236	integers, are `objects'. All objects participate in a common garbage
				237	collection scheme (currently implemented using reference counting).
				238	Assignment is cheap, independent of object size and type: only a
				239	pointer to the assigned object is stored in the assigned-to variable.
				240	No type checking is performed on assignment; only specific operations
				241	like addition test for particular operand types.
				242
				243	The basic object types in Python are numbers, strings, tuples, lists
				244	and dictionaries. Some other object types are open files, functions,
				245	modules, classes, and class instances; even types themselves are
				246	represented as objects. Extension modules written in C can define
				247	additional object types; examples are objects representing windows and
				248	Amoeba capabilities. Finally, the implementation itself makes heavy
				249	use of objects, and defines some private object types that aren't
				250	normally visible to the user. There is no explicit pointer type in
				251	Python.
				252
				253	{\em Numbers}, both integers and floating point, are pretty
				254	straightforward. The notation for numeric literals is the same as in
				255	C, including octal and hexadecimal integers; precision is the same as
				256	{\tt long} or {\tt double} in C\@. A third numeric type, `long
				257	integer', written with an `L' suffix, can be used for arbitrary
				258	precision calculations. All arithmetic, shifting and masking
				259	operations from C are supported.
				260
				261	{\em Strings} are `primitive' objects just like numbers. String
				262	literals are written between single quotes, using similar escape
				263	sequences as in C\@. Operations are built into the language to
				264	concatenate and to replicate strings, to extract substrings, etc.
				265	There is no limit to the length of the strings created by a program.
				266	There is no separate character data type; strings of length one do
				267	nicely.
				268
				269	{\em Tuples} are a way to `pack' small amounts of heterogeneous data
				270	together and carry them around as a unit. Unlike structure members in
				271	C, tuple items are nameless. Packing and unpacking assignments allow
				272	access to the items, for example:
				273	\begin{verbatim}
				274	x = 'Hi', (1, 2), 'World' # x is a 3-item tuple,
				275	# its middle item is (1, 2)
				276	p, q, r = x # unpack x into p, q and r
				277	a, b = q # unpack q into a and b
				278	\end{verbatim}
				279	A combination of packing and unpacking assignment can be used as
				280	parallel assignment, and is idiom for permutations, e.g.:
				281	\begin{verbatim}
				282	p, q = q, p # swap without temporary
				283	a, b, c = b, c, a # cyclic permutation
				284	\end{verbatim}
				285	Tuples are also used for function argument lists if there is more than
				286	one argument. A tuple object, once created, cannot be modified; but
				287	it is easy enough to unpack it and create a new, modified tuple from
				288	the unpacked items and assign this to the variable that held the
				289	original tuple object (which will then be garbage-collected).
				290
				291	{\em Lists} are array-like objects. List items may be arbitrary
				292	objects and can be accessed and changed using standard subscription
				293	notation. Lists support item insertion and deletion, and can
				294	therefore be used as queues, stacks etc.; there is no limit to their
				295	size.
				296
				297	Strings, tuples and lists together are {\em sequence} types. These
				298	share a common notation for generic operations on sequences such as
				299	subscription, concatenation, slicing (taking subsequences) and
				300	membership tests. As in C, subscripts start at 0.
				301
				302	{\em Dictionaries} are `mappings' from one domain to another. The
				303	basic operations on dictionaries are item insertion, extraction and
				304	deletion, using subscript notation with the key as subscript. (The
				305	current implementation allows only strings in the key domain, but a
				306	future version of the language may remove this restriction.)
				307
				308	\subsection{Statements}
				309
				310	Python has various kinds of simple statements, such as assignments
				311	and {\tt print} statements, and several kinds of compound statements,
				312	like {\tt if} and {\tt for} statements. Formally, function
				313	definitions and {\tt import} statements are also statements, and there
				314	are no restrictions on the ordering of statements or their nesting:
				315	{\tt import} may be used inside a function, functions may be defined
				316	conditionally using an {\tt if} statement, etc. The effect of a
				317	declaration-like statement takes place only when it is executed.
				318
				319	All statements except assignments and expression statements begin with
				320	a keyword: this makes the language easy to parse. An overview of the
				321	most common statement forms in Python follows.
				322
				323	An {\em assignment} has the general form
				324	\vspace{\itemsep}
				325
				326	\noindent
				327	{\em variable $=$ variable $= ... =$ variable $=$ expression}
				328	\vspace{\itemsep}
				329
				330	It assigns the value of the expression to all listed variables. (As
				331	shown in the section on tuples, variables and expressions can in fact
				332	be comma-separated lists.) The assignment operator is not an
				333	expression operator; there are no horrible things in Python like
				334	\begin{verbatim}
				335	while (p = p->next) { ... }
				336	\end{verbatim}
				337	Expression syntax is mostly straightforward and will not be explained
				338	in detail here.
				339
				340	An {\em expression statement} is just an expression on a line by
				341	itself. This writes the value of the expression to standard output,
				342	in a suitably unambiguous way, unless it is a `procedure call' (a
				343	function call that returns no value). Writing the value is useful
				344	when Python is used in `calculator mode', and reminds the programmer
				345	not to ignore function results.
				346
				347	The {\tt if} statement allows conditional execution. It has optional
				348	{\tt elif} and {\tt else} parts; a construct like {\tt
				349	if...elif...elif...elif...else} can be used to compensate for the
				350	absence of a {\em switch} or {\em case} statement.
				351
				352	Looping is done with {\tt while} and {\tt for} statements. The latter
				353	(demonstrated in the `ls' example earlier) iterates over the elements
				354	of a `sequence' (see the discussion of data types below). It is
				355	possible to terminate a loop with a {\tt break} statement or to start
				356	the next iteration with {\tt continue}. Both looping statements have
				357	an optional {\tt else} clause which is executed after the loop is
				358	terminated normally, but skipped when it is terminated by {\tt break}.
				359	This can be handy for searches, to handle the case that the item is
				360	not found.
				361
				362	Python's {\em exception} mechanism is modelled after that of Modula-3.
				363	Exceptions are raised by the interpreter when an illegal operation is
				364	tried. It is also possible to explicitly raise an exception with the
				365	{\tt raise} statement:
				366	\vspace{\itemsep}
				367
				368	\noindent
				369	{\tt raise {\em expression}, {\em expression}}
				370	\vspace{\itemsep}
				371
				372	The first expression identifies which exception should be raised;
				373	there are several built-in exceptions and the user may define
				374	additional ones. The second, optional expression is passed to the
				375	handler, e.g. as a detailed error message.
				376
				377	Exceptions may be handled (caught) with the {\tt try} statement, which
				378	has the following general form:
				379	\vspace{\itemsep}
				380
				381	\noindent
				382	{\tt
				383	\begin{tabular}{l}
				384	try: {\em block} \\
				385	except {\em expression}, {\em variable}: {\em block} \\
				386	except {\em expression}, {\em variable}: {\em block} \\
				387	... \\
				388	except: {\em block}
				389	\end{tabular}
				390	}
				391	\vspace{\itemsep}
				392
				393	When an exception is raised during execution of the first block, a
				394	search for an exception handler starts. The first {\tt except} clause
				395	whose {\em expression} matches the exception is executed. The
				396	expression may specify a list of exceptions to match against. A
				397	handler without an expression serves as a `catch-all'. If there is no
				398	match, the search for a handler continues with outer {\tt try}
				399	statements; if no match is found on the entire invocation stack, an
				400	error message and stack trace are printed, and the program is
				401	terminated (interactively, the interpreter returns to its main loop).
				402
				403	Note that the form of the {\tt except} clauses encourages a style of
				404	programming whereby only selected exceptions are caught, passing
				405	unanticipated exceptions on to the caller and ultimately to the user.
				406	This is preferable over a simpler `catch-all' error handling
				407	mechanism, where a simplistic handler intended to catch a single type
				408	of error like `file not found' can easily mask genuine programming
				409	errors --- especially in a language like Python which relies strongly
				410	on run-time checking and allows the catching of almost any type of
				411	error.
				412
				413	Other common statement forms, which we have already encountered, are
				414	function definitions, {\tt import} statements and {\tt print}
				415	statements. There is also a {\tt del} statement to delete one or more
				416	variables, a {\tt return} statement to return from a function, and a
				417	{\tt global} statement to allow assignments to global variables.
				418	Finally, the {\tt pass} statement is a no-op.
				419
				420	\subsection{Execution Model}
				421
				422	A Python program is executed by a stack-based interpreter.
				423
				424	When a function is called, a new `execution environment' for it is
				425	pushed onto the stack. An execution environment contains (among other
				426	data) pointers to two `symbol tables' that are used to hold variables:
				427	the local and the global symbol table. The local symbol table
				428	contains local variables of the current function invocation (including
				429	the function arguments); the global symbol table contains variables
				430	defined in the module containing the current function.
				431
				432	The `global' symbol table is thus only global with respect to the
				433	current function. There are no system-wide global variables; using
				434	the {\tt import} statement it is easy enough to reference variables
				435	that are defined in other modules. A system-wide read-only symbol
				436	table is used for built-in functions and constants though.
				437
				438	On assignment to a variable, by default an entry for it is made in the
				439	local symbol table of the current execution environment. The {\tt
				440	global} command can override this (it is not enough that a global
				441	variable by the same name already exists). When a variable's value is
				442	needed, it is searched first in the local symbol table, then in the
				443	global one, and finally in the symbol table containing built-in
				444	functions and constants.
				445
				446	The term `variable' in this context refers to any name: functions and
				447	imported modules are searched in exactly the same way.
				448
				449	Names defined in a module's symbol table survive until the end of the
				450	program. This approximates the semantics of file-static global
				451	variables in C or module variables in Modula-3. A module is
				452	initialized the first time it is imported, by executing the text of
				453	the module as a parameterless function whose local and global symbol
				454	tables are the same, so names are defined in module's symbol table.
				455	(Modules implemented in C have another way to define symbols.)
				456
				457	A Python main program is read from standard input or from a script
				458	file passed as an argument to the interpreter. It is executed as if
				459	an anonymous module was imported. Since {\tt import} statements are
				460	executed like all other statements, the initialization order of the
				461	modules used in a program is defined by the flow of control through
				462	the program.
				463
				464	The `attribute' notation {\em m.name}, where {\em m} is a module,
				465	accesses the symbol {\em name} in that module's symbol table. It can
				466	be assigned to as well. This is in fact a special case of the
				467	construct {\em x.name} where {\em x} denotes an arbitrary object; the
				468	type of {\em x} determines how this is to be interpreted, and what
				469	assignment to it means.
				470
				471	For instance, when {\tt a} is a list object, {\tt a.append} yields a
				472	built-in `method' object which, when called, appends an item to {\tt a}.
				473	(If {\tt a} and {\tt b} are distinct list objects, {\tt a.append} and
				474	{\tt b.append} are distinguishable method objects.) Normally, in
				475	statements like {\tt a.append(x)}, the method object {\tt a.append} is
				476	called and then discarded, but this is a matter of convention.
				477
				478	List attributes are read-only --- the user cannot define new list
				479	methods. Some objects, like numbers and strings, have no attributes
				480	at all. Like all type checking in Python, the meaning of an attribute
				481	is determined at run-time --- when the parser sees {\em x.name}, it
				482	has no idea of the type of {\em x}. Note that {\em x} here does not
				483	have to be a variable --- it can be an arbitrary (perhaps
				484	parenthesized) expression.
				485
				486	Given the flexibility of the attribute notation, one is tempted to use
				487	methods to replace all standard operations. Yet, Python has kept a
				488	small repertoire of built-in functions like {\tt len()} and {\tt
				489	abs()}. The reason is that in some cases the function notation is
				490	more familiar than the method notation; just like programs would
				491	become less readable if all infix operators were replaced by function
				492	calls, they would become less readable if all function calls had to be
				493	replaced by method calls (and vice versa!).
				494
				495	The choice whether to make something a built-in function or a method
				496	is a matter of taste. For arithmetic and string operations, function
				497	notation is preferred, since frequently the argument to such an
				498	operation is an expression using infix notation, as in {\tt abs(a+b)};
				499	this definitely looks better than {\tt (a+b).abs()}. The choice
				500	between make something a built-in function or a function defined in a
				501	built-in method (requiring {\tt import}) is similarly guided by
				502	intuition; all in all, only functions needed by `general' programming
				503	techniques are built-in functions.
				504
				505	\subsection{Classes}
				506
				507	Python has a class mechanism distinct from the object-orientation
				508	already explained. A class in Python is not much more than a
				509	collection of methods and a way to create class instances. Class
				510	methods are ordinary functions whose first parameter is the class
				511	instance; they are called using the method notation.
				512
				513	For instance, a class can be defined as follows:
				514	\begin{verbatim}
				515	class Foo:
				516	def meth1(self, arg): ...
				517	def meth2(self): ...
				518	\end{verbatim}
				519	A class instance is created by
				520	{\tt x = Foo()}
				521	and its methods can be called thus:
				522	\begin{verbatim}
				523	x.meth1('Hi There!')
				524	x.meth2()
				525	\end{verbatim}
				526	The functions used as methods are also available as attributes of the
				527	class object, and the above method calls could also have been written
				528	as follows:
				529	\begin{verbatim}
				530	Foo.meth1(x, 'Hi There!')
				531	Foo.meth2(x)
				532	\end{verbatim}
				533	Class methods can store instance data by assigning to instance data
				534	attributes, e.g.:
				535	\begin{verbatim}
				536	self.size = 100
				537	self.title = 'Dear John'
				538	\end{verbatim}
				539	Data attributes do not have to be declared; as with local variables,
				540	they spring into existence when assigned to. It is a matter of
				541	discretion to avoid name conflicts with method names. This facility
				542	is also available to class users; instances of a method-less class can
				543	be used as records with named fields.
				544
				545	There is no built-in mechanism for instance initialization. Classes
				546	by convention provide an {\tt init()} method which initializes the
				547	instance and then returns it, so the user can write
				548	\begin{verbatim}
				549	x = Foo().init('Dr. Strangelove')
				550	\end{verbatim}
				551
				552	Any user-defined class can be used as a base class to derive other
				553	classes. However, built-in types like lists cannot be used as base
Guido van Rossum	16d6e71	1994-08-08 12:30:22 +0000	[diff] [blame]	554	classes. (Incidentally, the same is true in \Cpp{} and Modula-3.) A
Guido van Rossum	2bbb3c0	1992-02-11 15:52:24 +0000	[diff] [blame]	555	class may override any method of its base classes. Instance methods
				556	are first searched in the method list of their class, and then,
				557	recursively, in the method lists of their base class. Initialization
				558	methods of derived classes should explicitly call the initialization
				559	methods of their base class.
				560
				561	A simple form of multiple inheritance is also supported: a class can
				562	have multiple base classes, but the language rules for resolving name
				563	conflicts are somewhat simplistic, and consequently the feature has so
				564	far found little usage.
				565
				566	\subsection{The Python Library}
				567
				568	Python comes with an extensive library, structured as a collection of
				569	modules. A few modules are built into the interpreter: these
				570	generally provide access to system libraries implemented in C such as
				571	mathematical functions or operating system calls. Two built-in
				572	modules provide access to internals of the interpreter and its
				573	environment. Even abusing these internals will at most cause an
				574	exception in the Python program; the interpreter will not dump core
				575	because of errors in Python code.
				576
				577	Most modules however are written in Python and distributed with the
				578	interpreter; they provide general programming tools like string
				579	operations and random number generators, provide more convenient
				580	interfaces to some built-in modules, or provide specialized services
				581	like a {\em getopt}-style command line option processor for
				582	stand-alone scripts.
				583
				584	There are also some modules written in Python that dig deep in the
				585	internals of the interpreter; there is a module to browse the stack
				586	backtrace when an unhandled exception has occurred, one to disassemble
				587	the internal representation of Python code, and even an interactive
				588	source code debugger which can trace Python code, set breakpoints,
				589	etc.
				590
				591	\subsection{Extensibility}
				592
				593	It is easy to add new built-in modules written in C to the Python
				594	interpreter. Extensions appear to the Python user as built-in
				595	modules. Using a built-in module is no different from using a module
				596	written in Python, but obviously the author of a built-in module can
				597	do things that cannot be implemented purely in Python.
				598
				599	In particular, built-in modules can contain Python-callable functions
				600	that call functions from particular system libraries (`wrapper
				601	functions'), and they can define new object types. In general, if a
				602	built-in module defines a new object type, it should also provide at
				603	least one function that creates such objects. Attributes of such
				604	object types are also implemented in C; they can return data
				605	associated with the object or methods, implemented as C functions.
				606
				607	For instance, an extension was created for Amoeba: it provides wrapper
				608	functions for the basic Amoeba name server functions, and defines a
				609	`capability' object type, whose methods are file server operations.
				610	Another extension is a built-in module called {\tt posix}; it provides
				611	wrappers around post UNIX system calls. Extension modules also
				612	provide access to two different windowing/graphics interfaces: STDWIN
				613	\cite{STDWIN}
				614	(which connects to X11 on UNIX and to the Mac Toolbox on the
				615	Macintosh), and the Graphics Library (GL) for Silicon Graphics
				616	machines.
				617
				618	Any function in an extension module is supposed to type-check its
				619	arguments; the interpreter contains a convenience function to
				620	facilitate extracting C values from arguments and type-checking them
				621	at the same time. Returning values is also painless, using standard
				622	functions to create Python objects from C values.
				623
				624	On some systems extension modules may be dynamically loaded, thus
				625	avoiding the need to maintain a private copy of the Python interpreter
				626	in order to use a private extension.
				627
				628	\section{A Short Description of AIL and Amoeba}
				629
				630	An RPC stub generator takes an interface description as input. The
				631	designer of a stub generator has at least two choices for the input
				632	language: use a suitably restricted version of the target language, or
				633	design a new language. The first solution was chosen, for instance,
				634	by the designers of Flume, the stub generator for the Topaz
				635	distributed operating system built at DEC SRC
				636	\cite{Flume,Evolving}.
				637
				638	Flume's one and only target language is Modula-2+ (the predecessor of
				639	Modula-3). Modula-2+, like Modula-N for any N, has an interface
				640	syntax that is well suited as a stub generator input language: an
				641	interface module declares the functions that are `exported' by a
				642	module implementation, with their parameter and return types, plus the
				643	types and constants used for the parameters. Therefore, the input to
				644	Flume is simply a Modula-2+ interface module. But even in this ideal
				645	situation, an RPC stub generator needs to know things about functions
				646	that are not stated explicitly in the interface module: for instance,
				647	the transfer direction of VAR parameters (IN, OUT or both) is not
				648	given. Flume solves this and other problems by a mixture of
				649	directives hidden in comments and a convention for the names of
				650	objects. Thus, one could say that the designers of Flume really
				651	created a new language, even though it looks remarkably like their
				652	target language.
				653
				654	\subsection{The AIL Input Language}
				655
				656	Amoeba uses C as its primary programming language. C function
				657	declarations (at least in `Classic' C) don't specify the types of
				658	the parameters, let alone their transfer direction. Using this as
				659	input for a stub generator would require almost all information for
				660	the stub generator to be hidden inside comments, which would require a
				661	rather contorted scanner. Therefore we decided to design the input
				662	syntax for Amoeba's stub generator `from scratch'. This gave us the
				663	liberty to invent proper syntax not only for the transfer direction of
				664	parameters, but also for variable-length arrays.
				665
				666	On the other hand we decided not to abuse our freedom, and borrowed as
				667	much from C as we could. For instance, AIL runs its input through the
				668	C preprocessor, so we get macros, include files and conditional
				669	compilation for free. AIL's type declaration syntax is a superset of
				670	C's, so the user can include C header files to use the types declared
				671	there as function parameter types --- which are declared using
Guido van Rossum	16d6e71	1994-08-08 12:30:22 +0000	[diff] [blame]	672	function prototypes as in \Cpp{} or Standard C\@. It should be clear by
Guido van Rossum	2bbb3c0	1992-02-11 15:52:24 +0000	[diff] [blame]	673	now that AIL's lexical conventions are also identical to C's. The
				674	same is true for its expression syntax.
				675
				676	Where does AIL differ from C, then? Function declarations in AIL are
				677	grouped in {\em classes}. Classes in AIL are mostly intended as a
				678	grouping mechanism: all functions implemented by a server are grouped
				679	together in a class. Inheritance is used to form new groups by adding
				680	elements to existing groups; multiple inheritance is supported to join
				681	groups together. Classes can also contain constant and type
				682	definitions, and one form of output that AIL can generate is a header
				683	file for use by C programmers who wish to use functions from a
				684	particular AIL class.
				685
				686	Let's have a look at some (unrealistically simple) class definitions:
				687	\begin{verbatim}
				688	#include <amoeba.h> /* Defines `capability', etc. */
				689
				690	class standard_ops [1000 .. 1999] {
				691	/* Operations supported by most interfaces */
				692	std_info(*, out char buf[size:100], out int size);
				693	std_destroy(*);
				694	};
				695	\end{verbatim}
				696	This defines a class called `standard\_ops' whose request codes are
				697	chosen by AIL from the range 1000-1999. Request codes are small
				698	integers used to identify remote operations. The author of the class
				699	must specify a range from which AIL chooses, and class authors must
				700	make sure they avoid conflicts, e.g. by using an `assigned number
				701	administration office'. In the example, `std\_info' will be assigned
				702	request code 1000 and `std\_destroy' will get code 1001. There is
				703	also an option to explicitly assign request codes, for compatibility
				704	with servers with manually written interfaces.
				705
				706	The class `standard\_ops' defines two operations, `std\_info' and
				707	`std\_destroy'. The first parameter of each operation is a star
				708	(`*'); this is a placeholder for a capability that must be passed when
				709	the operation is called. The description of Amoeba below explains the
				710	meaning and usage of capabilities; for now, it is sufficient to know
				711	that a capability is a small structure that uniquely identifies an
				712	object and a server or service.
				713
				714	The standard operation `std\_info' has two output parameters: a
				715	variable-size character buffer (which will be filled with a short
				716	descriptive string of the object to which the operation is applied)
				717	and an integer giving the length of this string. The standard
				718	operation `std\_destroy' has no further parameters --- it just
				719	destroys the object, if the caller has the right to do so.
				720
				721	The next class is called `tty':
				722	\begin{verbatim}
				723	class tty [2000 .. 2099] {
				724	inherit standard_ops;
				725	const TTY_MAXBUF = 1000;
				726	tty_write(*, char buf[size:TTY_MAXBUF], int size);
				727	tty_read(*, out char buf[size:TTY_MAXBUF], out int size);
				728	};
				729	\end{verbatim}
				730	The request codes for operations defined in this class lie in the
				731	range 2000-2099; inherited operations use the request codes already
				732	assigned to them. The operations defined by this class are
				733	`tty\_read' and `tty\_write', which pass variable-sized data buffers
				734	between client and server. Class `tty' inherits class
				735	`standard\_ops', so tty objects also support the operations
				736	`std\_info' and `std\_destroy'.
				737
				738	Only the {\em interface} for `std\_info' and `std\_destroy' is shared
				739	between tty objects and other objects whose interface inherits
				740	`standard\_ops'; the implementation may differ. Even multiple
				741	implementations of the `tty' interface may exist, e.g. a driver for a
				742	console terminal and a terminal emulator in a window. To expand on
				743	the latter example, consider:
				744	\begin{verbatim}
				745	class window [2100 .. 2199] {
				746	inherit standard_ops;
				747	win_create(*, int x, int y, int width, int height,
				748	out capability win_cap);
				749	win_reconfigure(*, int x, int y, int width, int height);
				750	};
				751
				752	class tty_emulator [2200 .. 2299] {
				753	inherit tty, window;
				754	};
				755	\end{verbatim}
				756	Here two new interface classes are defined.
				757	Class `window' could be used for creating and manipulating windows.
				758	Note that `win\_create' returns a capability for the new window.
				759	This request should probably should be sent to a generic window
				760	server capability, or it might create a subwindow when applied to a
				761	window object.
				762
				763	Class `tty\_emulator' demonstrates the essence of multiple inheritance.
				764	It is presumably the interface to a window-based terminal emulator.
				765	Inheritance is transitive, so `tty\_emulator' also implicitly inherits
				766	`standard\_ops'.
				767	In fact, it inherits it twice: once via `tty' and once via `window'.
				768	Since AIL class inheritance only means interface sharing, not
				769	implementation sharing, inheriting the same class multiple times is
				770	never a problem and has the same effect as inheriting it once.
				771
Guido van Rossum	16d6e71	1994-08-08 12:30:22 +0000	[diff] [blame]	772	Note that the power of AIL classes doesn't go as far as \Cpp{}.
Guido van Rossum	2bbb3c0	1992-02-11 15:52:24 +0000	[diff] [blame]	773	AIL classes cannot have data members, and there is
				774	no mechanism for a server that implements a derived class
				775	to inherit the implementation of the base
				776	class --- other than copying the source code.
				777	The syntax for class definitions and inheritance is also different.
				778
				779	\subsection{Amoeba}
				780
				781	The smell of `object-orientedness' that the use of classes in AIL
				782	creates matches nicely with Amoeba's object-oriented approach to
				783	RPC\@. In Amoeba, almost all operating system entities (files,
				784	directories, processes, devices etc.) are implemented as {\em
				785	objects}. Objects are managed by {\em services} and represented by
				786	{\em capabilities}. A capability gives its holder access to the
				787	object it represents. Capabilities are protected cryptographically
				788	against forgery and can thus be kept in user space. A capability is a
				789	128-bit binary string, subdivided as follows:
				790
				791	% XXX Need a better version of this picture!
				792	\begin{verbatim}
				793	48 24 8 48 Bits
				794	+----------------+------------+--------+---------------+
				795	\| Service \| Object \| Perm. \| Check \|
				796	\| port \| number \| bits \| word \|
				797	+----------------+------------+--------+---------------+
				798	\end{verbatim}
				799
				800	The service port is used by the RPC implementation in the Amoeba
				801	kernel to locate a server implementing the service that manages the
				802	object. In many cases there is a one-to-one correspondence between
				803	servers and services (each service is implemented by exactly one
				804	server process), but some services are replicated. For instance,
				805	Amoeba's directory service, which is crucial for gaining access to most
				806	other services, is implemented by two servers that listen on the same
				807	port and know about exactly the same objects.
				808
				809	The object number in the capability is used by the server receiving
				810	the request for identifying the object to which the operation applies.
				811	The permission bits specify which operations the holder of the capability
				812	may apply. The last part of a capability is a 48-bit long `check
				813	word', which is used to prevent forgery. The check word is computed
				814	by the server based upon the permission bits and a random key per object
				815	that it keeps secret. If you change the permission bits you must compute
				816	the proper check word or else the server will refuse the capability.
				817	Due to the size of the check word and the nature of the cryptographic
				818	`one-way function' used to compute it, inverting this function is
				819	impractical, so forging capabilities is impossible.%
				820	\footnote{
				821	As computers become faster, inverting the one-way function becomes
				822	less impractical.
				823	Therefore, a next version of Amoeba will have 64-bit check words.
				824	}
				825
				826	A working Amoeba system is a collection of diverse servers, managing
				827	files, directories, processes, devices etc. While most servers have
				828	their own interface, there are some requests that make sense for some
				829	or all object types. For instance, the {\em std\_info()} request,
				830	which returns a short descriptive string, applies to all object types.
				831	Likewise, {\em std\_destroy()} applies to files, directories and
				832	processes, but not to devices.
				833
				834	Similarly, different file server implementations may want to offer the
				835	same interface for operations like {\em read()} and {\em write()} to
				836	their clients. AIL's grouping of requests into classes is ideally
				837	suited to describe this kind of interface sharing, and a class
				838	hierarchy results which clearly shows the similarities between server
				839	interfaces (not necessarily their implementations!).
				840
				841	The base class of all classes defines the {\em std\_info()} request.
				842	Most server interfaces actually inherit a derived class that also
				843	defines {\em std\_destroy().} File servers inherit a class that
				844	defines the common operations on files, etc.
				845
				846	\subsection{How AIL Works}
				847
				848	The AIL stub generator functions in three phases:
				849	\begin{itemize}
				850	\item
				851	parsing,
				852	\item
				853	strategy determination,
				854	\item
				855	code generation.
				856	\end{itemize}
				857
				858	{\bf Phase one} parses the input and builds a symbol table containing
				859	everything it knows about the classes and other definitions found in
				860	the input.
				861
				862	{\bf Phase two} determines the strategy to use for each function
				863	declaration in turn and decides upon the request and reply message
				864	formats. This is not a simple matter, because of various optimization
				865	attempts. Amoeba's kernel interface for RPC requests takes a
				866	fixed-size header and one arbitrary-size buffer. A large part of the
				867	header holds the capability of the object to which the request is
				868	directed, but there is some space left for a few integer parameters
				869	whose interpretation is left up to the server. AIL tries to use these
				870	slots for simple integer parameters, for two reasons.
				871
				872	First, unlike the buffer, header fields are byte-swapped by the RPC
				873	layer in the kernel if necessary, so it saves a few byte swapping
				874	instructions in the user code. Second, and more important, a common
				875	form of request transfers a few integers and one large buffer to or
				876	from a server. The {\em read()} and {\em write()} requests of most
				877	file servers have this form, for instance. If it is possible to place
				878	all integer parameters in the header, the address of the buffer
				879	parameter can be passed directly to the kernel RPC layer. While AIL
				880	is perfectly capable of handling requests that do not fit this format,
				881	the resulting code involves allocating a new buffer and copying all
				882	parameters into it. It is a top priority to avoid this copying
				883	(`marshalling') if at all possible, in order to maintain Amoeba's
				884	famous RPC performance.
				885
				886	When AIL resorts to copying parameters into a buffer, it reorders them
				887	so that integers indicating the lengths of variable-size arrays are
				888	placed in the buffer before the arrays they describe, since otherwise
				889	decoding the request would be impossible. It also adds occasional
				890	padding bytes to ensure integers are aligned properly in the buffer ---
				891	this can speed up (un)marshalling.
				892
				893	{\bf Phase three} is the code generator, or back-end. There are in
				894	fact many different back-ends that may be called in a single run to
				895	generate different types of output. The most important output types
				896	are header files (for inclusion by the clients of an interface),
				897	client stubs, and `server main loop' code. The latter decodes
				898	incoming requests in the server. The generated code depends on the
				899	programming language requested, and there are separate back-ends for
				900	each supported language.
				901
				902	It is important that the strategy chosen by phase two is independent
				903	of the language requested for phase three --- otherwise the
				904	interoperability of servers and clients written in different languages
				905	would be compromised.
				906
				907	\section{Linking AIL to Python}
				908
				909	From the previous section it can be concluded that linking AIL to
				910	Python is a matter of writing a back-end for Python. This is indeed
				911	what we did.
				912
				913	Considerable time went into the design of the back-end in order to
				914	make the resulting RPC interface for Python fit as smoothly as
				915	possible in Python's programming style. For instance, the issues of
				916	parameter transfer, variable-size arrays, error handling, and call
				917	syntax were all solved in a manner that favors ease of use in Python
				918	rather than strict correspondence with the stubs generated for C,
				919	without compromising network-level compatibility.
				920
				921	\subsection{Mapping AIL Entities to Python}
				922
				923	For each programming language that AIL is to support, a mapping must
				924	be designed between the data types in AIL and those in that language.
				925	Other aspects of the programming languages, such as differences in
				926	function call semantics, must also be taken care of.
				927
				928	While the mapping for C is mostly straightforward, the mapping for
				929	Python requires a little thinking to get the best results for Python
				930	programmers.
				931
				932	\subsubsection{Parameter Transfer Direction}
				933
				934	Perhaps the simplest issue is that of parameter transfer direction.
				935	Parameters of functions declared in AIL are categorized as being of
				936	type {\tt in}, {\tt out} or {\tt in} {\tt out} (the same distinction
				937	as made in Ada). Python only has call-by-value parameter semantics;
				938	functions can return multiple values as a tuple. This means that,
				939	unlike the C back-end, the Python back-end cannot always generate
				940	Python functions with exactly the same parameter list as the AIL
				941	functions.
				942
				943	Instead, the Python parameter list consists of all {\tt in} and {\tt
				944	in} {\tt out} parameters, in the order in which they occur in the AIL
				945	parameter list; similarly, the Python function returns a tuple
				946	containing all {\tt in} {\tt out} and {\tt out} parameters. In fact
				947	Python packs function parameters into a tuple as well, stressing the
				948	symmetry between parameters and return value. For example, a stub
				949	with this AIL parameter list:
				950	\begin{verbatim}
				951	(*, in int p1, in out int p2, in int p3, out int p4)
				952	\end{verbatim}
				953	will have the following parameter list and return values in Python:
				954	\begin{verbatim}
				955	(p1, p2, p3) -> (p2, p4)
				956	\end{verbatim}
				957
				958	\subsubsection{Variable-size Entities}
				959
				960	The support for variable-size objects in AIL is strongly guided by the
				961	limitations of C in this matter. Basically, AIL allows what is
				962	feasible in C: functions may have variable-size arrays as parameters
				963	(both input or output), provided their length is passed separately.
				964	In practice this is narrowed to the following rule: for each
				965	variable-size array parameter, there must be an integer parameter
				966	giving its length. (An exception for null-terminated strings is
				967	planned but not yet realized.)
				968
				969	Variable-size arrays in AIL or C correspond to {\em sequences} in
				970	Python: lists, tuples or strings. These are much easier to use than
				971	their C counterparts. Given a sequence object in Python, it is always
				972	possible to determine its size: the built-in function {\tt len()}
				973	returns it. It would be annoying to require the caller of an RPC stub
				974	with a variable-size parameter to also pass a parameter that
				975	explicitly gives its size. Therefore we eliminate all parameters from
				976	the Python parameter list whose value is used as the size of a
				977	variable-size array. Such parameters are easily found: the array
				978	bound expression contains the name of the parameter giving its size.
				979	This requires the stub code to work harder (it has to recover the
				980	value for size parameters from the corresponding sequence parameter),
				981	but at least part of this work would otherwise be needed as well, to
				982	check that the given and actual sizes match.
				983
				984	Because of the symmetry in Python between the parameter list and the
				985	return value of a function, the same elimination is performed on
				986	return values containing variable-size arrays: integers returned
				987	solely to tell the client the size of a returned array are not
				988	returned explicitly to the caller in Python.
				989
				990	\subsubsection{Error Handling}
				991
				992	Another point where Python is really better than C is the issue of
				993	error handling. It is a fact of life that everything involving RPC
				994	may fail, for a variety of reasons outside the user's control: the
				995	network may be disconnected, the server may be down, etc. Clients
				996	must be prepared to handle such failures and recover from them, or at
				997	least print an error message and die. In C this means that every
				998	function returns an error status that must be checked by the caller,
				999	causing programs to be cluttered with error checks --- or worse,
				1000	programs that ignore errors and carry on working with garbage data.
				1001
				1002	In Python, errors are generally indicated by exceptions, which can be
				1003	handled out of line from the main flow of control if necessary, and
				1004	cause immediate program termination (with a stack trace) if ignored.
				1005	To profit from this feature, all RPC errors that may be encountered by
				1006	AIL-generated stubs in Python are turned into exceptions. An extra
				1007	value passed together with the exception is used to relay the error
				1008	code returned by the server to the handler. Since in general RPC
				1009	failures are rare, Python test programs can usually ignore exceptions
				1010	--- making the program simpler --- without the risk of occasional
Guido van Rossum	16d6e71	1994-08-08 12:30:22 +0000	[diff] [blame]	1011	errors going undetected. (I still remember the embarrassment of a
Guido van Rossum	2bbb3c0	1992-02-11 15:52:24 +0000	[diff] [blame]	1012	hundredfold speed improvement reported, long, long, ago, about a new
				1013	version of a certain program, which later had to be attributed to a
				1014	benchmark that silently dumped core...)
				1015
				1016	\subsubsection{Function Call Syntax}
				1017
				1018	Amoeba RPC operations always need a capability parameter (this is what
				1019	the `*' in the AIL function templates stands for); the service is
				1020	identified by the port field of the capability. In C, the capability
				1021	must always be the first parameter of the stub function, but in Python
				1022	we can do better.
				1023
				1024	A Python capability is an opaque object type in its own right, which
				1025	is used, for instance, as parameter to and return value from Amoeba's
				1026	name server functions. Python objects can have methods, so it is
				1027	convenient to make all AIL-generated stubs methods of capabilities
				1028	instead of just functions. Therefore, instead of writing
				1029	\begin{verbatim}
				1030	some_stub(cap, other_parameters)
				1031	\end{verbatim}
				1032	as in C, Python programmers can write
				1033	\begin{verbatim}
				1034	cap.some_stub(other_parameters)
				1035	\end{verbatim}
				1036	This is better because it reduces name conflicts: in Python, no
				1037	confusion is possible between a stub and a local or global variable or
				1038	user-defined function with the same name.
				1039
				1040	\subsubsection{Example}
				1041
				1042	All the preceding principles can be seen at work in the following
				1043	example. Suppose a function is declared in AIL as follows:
				1044	\begin{verbatim}
				1045	some_stub(*, in char buf[size:1000], in int size,
				1046	out int n_done, out int status);
				1047	\end{verbatim}
				1048	In C it might be called by the following code (including declarations,
				1049	for clarity, but not initializations):
				1050	\begin{verbatim}
				1051	int err, n_done, status;
				1052	capability cap;
				1053	char buf[500];
				1054	...
				1055	err = some_stub(&cap, buf, sizeof buf, &n_done, &status);
				1056	if (err != 0) return err;
				1057	printf("%d done; status = %d\n", n_done, status);
				1058	\end{verbatim}
				1059	Equivalent code in Python might be the following:
				1060	\begin{verbatim}
				1061	cap = ...
				1062	buf = ...
				1063	n_done, status = cap.some_stub(buf)
				1064	print n_done, 'done;', 'status =', status
				1065	\end{verbatim}
				1066	No explicit error check is required in Python: if the RPC fails, an
				1067	exception is raised so the {\tt print} statement is never reached.
				1068
				1069	\subsection{The Implementation}
				1070
				1071	More or less orthogonal to the issue of how to map AIL operations to
				1072	the Python language is the question of how they should be implemented.
				1073
				1074	In principle it would be possible to use the same strategy that is
				1075	used for C: add an interface to Amoeba's low-level RPC primitives to
				1076	Python and generate Python code to marshal parameters into and out of
				1077	a buffer. However, Python's high-level data types are not well suited
				1078	for marshalling: byte-level operations are clumsy and expensive, with
				1079	the result that marshalling a single byte of data can take several
				1080	Python statements. This would mean that a large amount of code would
				1081	be needed to implement a stub, which would cost a lot of time to parse
				1082	and take up a lot of space in `compiled' form (as parse tree or pseudo
				1083	code). Execution of the marshalling code would be sluggish as well.
				1084
				1085	We therefore chose an alternate approach, writing the marshalling in
				1086	C, which is efficient at such byte-level operations. While it is easy
				1087	enough to generate C code that can be linked with the Python
				1088	interpreter, it would obviously not stimulate the use of Python for
				1089	server testing if each change to an interface required relinking the
				1090	interpreter (dynamic loading of C code is not yet available on
				1091	Amoeba). This is circumvented by the following solution: the
				1092	marshalling is handled by a simple {\em virtual machine}, and AIL
				1093	generates instructions for this machine. An interpreter for the
				1094	machine is linked into the Python interpreter and reads its
				1095	instructions from a file written by AIL.
				1096
				1097	The machine language for our virtual machine is dubbed {\em Stubcode}.
				1098	Stubcode is a super-specialized language. There are two sets of of
				1099	about a dozen instructions each: one set marshals Python objects
				1100	representing parameters into a buffer, the other set (similar but not
				1101	quite symmetric) unmarshals results from a buffer into Python objects.
				1102	The Stubcode interpreter uses a stack to hold Python intermediate
				1103	results. Other state elements are an Amoeba header and buffer, a
				1104	pointer indicating the current position in the buffer, and of course a
				1105	program counter. Besides (un)marshalling, the virtual machine must
				1106	also implement type checking, and raise a Python exception when a
				1107	parameter does not have the expected type.
				1108
				1109	The Stubcode interpreter marshals Python data types very efficiently,
				1110	since each instruction can marshal a large amount of data. For
				1111	instance, a whole Python string is marshalled by a single Stubcode
				1112	instruction, which (after some checking) executes the most efficient
				1113	byte-copying loop possible --- it calls {\tt memcpy()}.
				1114
				1115
				1116	Construction details of the Stubcode interpreter are straightforward.
				1117	Most complications are caused by the peculiarities of AIL's strategy
				1118	module and Python's type system. By far the most complex single
				1119	instruction is the `loop' instruction, which is used to marshal
				1120	arrays.
				1121
				1122	As an example, here is the complete Stubcode program (with spaces and
				1123	comments added for clarity) generated for the function {\tt
				1124	some\_stub()} of the example above. The stack contains pointers to
				1125	Python objects, and its initial contents is the parameter to the
				1126	function, the string {\tt buf}. The final stack contents will be the
				1127	function return value, the tuple {\tt (n\_done, status)}. The name
				1128	{\tt header} refers to the fixed size Amoeba RPC header structure.
				1129	\vspace{1em}
				1130
				1131	{\tt
				1132	\begin{tabular}{l l l}
				1133	BufSize & 1000 & {\em Allocate RPC buffer of 1000 bytes} \\
				1134	Dup & 1 & {\em Duplicate stack top} \\
				1135	StringS & & {\em Replace stack top by its string size} \\
				1136	PutI & h\_extra int32 & {\em Store top element in }header.h\_extra \\
				1137	TStringSlt & 1000 & {\em Assert string size less than 1000} \\
				1138	PutVS & & {\em Marshal variable-size string} \\
				1139	& & \\
				1140	Trans & 1234 & {\em Execute the RPC (request code 1234)} \\
				1141	& & \\
				1142	GetI & h\_extra int32 & {\em Push integer from} header.h\_extra \\
				1143	GetI & h\_size int32 & {\em Push integer from} header.h\_size \\
				1144	Pack & 2 & {\em Pack top 2 elements into a tuple} \\
				1145	\end{tabular}
				1146	}
				1147	\vspace{1em}
				1148
				1149	As much work as possible is done by the Python back-end in AIL, rather
				1150	than in the Stubcode interpreter, to make the latter both simple and
				1151	fast. For instance, the decision to eliminate an array size parameter
				1152	from the Python parameter list is taken by AIL, and Stubcode
				1153	instructions are generated to recover the size from the actual
				1154	parameter and to marshal it properly. Similarly, there is a special
				1155	alignment instruction (not used in the example) to meet alignment
				1156	requirements.
				1157
				1158	Communication between AIL and the Stubcode generator is via the file
				1159	system. For each stub function, AIL creates a file in its output
				1160	directory, named after the stub with a specific suffix. This file
				1161	contains a machine-readable version of the Stubcode program for the
				1162	stub. The Python user can specify a search path containing
				1163	directories which the interpreter searches for a Stubcode file the
				1164	first time the definition for a particular stub is needed.
				1165
				1166	The transformations on the parameter list and data types needed to map
				1167	AIL data types to Python data types make it necessary to help the
				1168	Python programmer a bit in figuring out the parameters to a call.
				1169	Although in most cases the rules are simple enough, it is sometimes
				1170	hard to figure out exactly what the parameter and return values of a
				1171	particular stub are. There are two sources of help in this case:
				1172	first, the exception contains enough information so that the user can
				1173	figure what type was expected; second, AIL's Python back-end
				1174	optionally generates a human-readable `interface specification' file.
				1175
				1176	\section{Conclusion}
				1177
				1178	We have succeeded in creating a useful extension to Python that
				1179	enables Amoeba server writers to test and experiment with their server
				1180	in a much more interactive manner. We hope that this facility will
				1181	add to the popularity of AIL amongst Amoeba programmers.
				1182
				1183	Python's extensibility was proven convincingly by the exercise
				1184	(performed by the second author) of adding the Stubcode interpreter to
				1185	Python. Standard data abstraction techniques are used to insulate
				1186	extension modules from details of the rest of the Python interpreter.
				1187	In the case of the Stubcode interpreter this worked well enough that
				1188	it survived a major overhaul of the main Python interpreter virtually
				1189	unchanged.
				1190
				1191	On the other hand, adding a new back-end to AIL turned out to be quite
				1192	a bit of work. One problem, specific to Python, was to be expected:
				1193	Python's variable-size data types differ considerably from the
				1194	C-derived data model that AIL favors. Two additional problems we
				1195	encountered were the complexity of the interface between AIL's second
				1196	and third phases, and a number of remaining bugs in the second phase
				1197	that surfaced when the implementation of the Python back-end was
				1198	tested. The bugs have been tracked down and fixed, but nothing
				1199	has been done about the complexity of the interface.
				1200
				1201	\subsection{Future Plans}
				1202
				1203	AIL's C back-end generates server main loop code as well as client
				1204	stubs. The Python back-end currently only generates client stubs, so
				1205	it is not yet possible to write servers in Python. While it is
				1206	clearly more important to be able to use Python as a client than as a
				1207	server, the ability to write server prototypes in Python would be a
				1208	valuable addition: it allows server designers to experiment with
				1209	interfaces in a much earlier stage of the design, with a much smaller
				1210	programming effort. This makes it possible to concentrate on concepts
				1211	first, before worrying about efficient implementation.
				1212
				1213	The unmarshalling done in the server is almost symmetric with the
				1214	marshalling in the client, and vice versa, so relative small
				1215	extensions to the Stubcode virtual machine will allow its use in a
				1216	server main loop. We hope to find the time to add this feature to a
				1217	future version of Python.
				1218
				1219	\section{Availability}
				1220
				1221	The Python source distribution is available to Internet users by
				1222	anonymous ftp to site {\tt ftp.cwi.nl} [IP address 192.16.184.180]
				1223	from directory {\tt /pub}, file name {\tt python*.tar.Z} (where the
				1224	{\tt *} stands for a version number). This is a compressed UNIX tar
				1225	file containing the C source and \LaTeX documentation for the Python
				1226	interpreter. It includes the Python library modules and the {\em
				1227	Stubcode} interpreter, as well as many example Python programs. Total
				1228	disk space occupied by the distribution is about 3 Mb; compilation
				1229	requires 1-3 Mb depending on the configuration built, the compile
				1230	options, etc.
				1231
				1232	\bibliographystyle{plain}
				1233
				1234	\bibliography{quabib}
				1235
				1236	\end{document}