blob: 6eeaacfb0540363a6732c85b6e2734a5b5dba44b [file] [log] [blame]
Guido van Rossumdb65a6c1993-11-05 17:11:16 +00001\documentstyle[twoside,11pt,myformat,times]{report}
Guido van Rossum7a2dba21993-11-05 14:45:11 +00002
3\title{\bf Extending and Embedding the Python Interpreter}
4
5\author{
6 Guido van Rossum \\
Guido van Rossumdb65a6c1993-11-05 17:11:16 +00007 Dept. CST, CWI, P.O. Box 94079 \\
8 1090 GB Amsterdam, The Netherlands \\
Guido van Rossum7a2dba21993-11-05 14:45:11 +00009 E-mail: {\tt guido@cwi.nl}
10}
11
Guido van Rossum83eb9621993-11-23 16:28:45 +000012\date{19 November 1993 \\ Release 0.9.9.++} % XXX update before release!
13
Guido van Rossum7a2dba21993-11-05 14:45:11 +000014% Tell \index to actually write the .idx file
15\makeindex
16
17\begin{document}
18
19\pagenumbering{roman}
20
21\maketitle
22
23\begin{abstract}
24
25\noindent
Guido van Rossum6f0132f1993-11-19 13:13:22 +000026This document describes how to write modules in C or C++ to extend the
27Python interpreter. It also describes how to use Python as an
28`embedded' language, and how extension modules can be loaded
29dynamically (at run time) into the interpreter, if the operating
30system supports this feature.
Guido van Rossum7a2dba21993-11-05 14:45:11 +000031
32\end{abstract}
33
34\pagebreak
35
36{
37\parskip = 0mm
38\tableofcontents
39}
40
41\pagebreak
42
43\pagenumbering{arabic}
44
Guido van Rossumdb65a6c1993-11-05 17:11:16 +000045
Guido van Rossum7a2dba21993-11-05 14:45:11 +000046\chapter{Extending Python with C or C++ code}
47
Guido van Rossum6f0132f1993-11-19 13:13:22 +000048
49\section{Introduction}
50
Guido van Rossum7a2dba21993-11-05 14:45:11 +000051It is quite easy to add non-standard built-in modules to Python, if
52you know how to program in C. A built-in module known to the Python
Guido van Rossum6f0132f1993-11-19 13:13:22 +000053programmer as \code{foo} is generally implemented by a file called
54\file{foomodule.c}. All but the most essential standard built-in
55modules also adhere to this convention, and in fact some of them form
56excellent examples of how to create an extension.
Guido van Rossum7a2dba21993-11-05 14:45:11 +000057
58Extension modules can do two things that can't be done directly in
Guido van Rossum6f0132f1993-11-19 13:13:22 +000059Python: they can implement new data types, and they can make system
60calls or call C library functions. Since the latter is usually the
61most important reason for adding an extension, I'll concentrate on
62adding `wrappers' around C library functions; the concrete example
63uses the wrapper for
64\code{system()} in module \code{posix}, found in (of course) the file
65\file{posixmodule.c}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +000066
67It is important not to be impressed by the size and complexity of
68the average extension module; much of this is straightforward
Guido van Rossum6f0132f1993-11-19 13:13:22 +000069`boilerplate' code (starting right with the copyright notice)!
Guido van Rossum7a2dba21993-11-05 14:45:11 +000070
Guido van Rossum6f0132f1993-11-19 13:13:22 +000071Let's skip the boilerplate and have a look at an interesting function
72in \file{posixmodule.c} first:
Guido van Rossum7a2dba21993-11-05 14:45:11 +000073
74\begin{verbatim}
75 static object *
76 posix_system(self, args)
77 object *self;
78 object *args;
79 {
80 char *command;
81 int sts;
82 if (!getargs(args, "s", &command))
83 return NULL;
84 sts = system(command);
Guido van Rossum6f0132f1993-11-19 13:13:22 +000085 return mkvalue("i", sts);
Guido van Rossum7a2dba21993-11-05 14:45:11 +000086 }
87\end{verbatim}
88
89This is the prototypical top-level function in an extension module.
90It will be called (we'll see later how this is made possible) when the
91Python program executes statements like
92
93\begin{verbatim}
94 >>> import posix
95 >>> sts = posix.system('ls -l')
96\end{verbatim}
97
98There is a straightforward translation from the arguments to the call
Guido van Rossum6f0132f1993-11-19 13:13:22 +000099in Python (here the single value \code{'ls -l'}) to the arguments that
100are passed to the C function. The C function always has two
101parameters, conventionally named \var{self} and \var{args}. In this
102example, \var{self} will always be a \code{NULL} pointer, since this is a
103function, not a method (this is done so that the interpreter doesn't
104have to understand two different types of C functions).
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000105
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000106The \var{args} parameter will be a pointer to a Python object, or
107\code{NULL} if the Python function/method was called without
108arguments. It is necessary to do full argument type checking on each
109call, since otherwise the Python user would be able to cause the
110Python interpreter to `dump core' by passing the wrong arguments to a
111function in an extension module (or no arguments at all). Because
112argument checking and converting arguments to C is such a common task,
113there's a general function in the Python interpreter which combines
114these tasks: \code{getargs()}. It uses a template string to determine
115both the types of the Python argument and the types of the C variables
116into which it should store the converted values. (More about this
117later.)\footnote{
118There are convenience macros \code{getstrarg()},
119\code{getintarg()}, etc., for many common forms of \code{getargs()}
120templates. These are relics from the past; it's better to call
121\code{getargs()} directly.}
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000122
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000123If \code{getargs()} returns nonzero, the argument list has the right
124type and its components have been stored in the variables whose
125addresses are passed. If it returns zero, an error has occurred. In
126the latter case it has already raised an appropriate exception by
127calling \code{err_setstr()}, so the calling function can just return
128\code{NULL}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000129
130
131\section{Intermezzo: errors and exceptions}
132
133An important convention throughout the Python interpreter is the
134following: when a function fails, it should set an exception condition
135and return an error value (often a NULL pointer). Exceptions are set
136in a global variable in the file errors.c; if this variable is NULL no
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000137exception has occurred. A second variable is the `associated value'
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000138of the exception.
139
140The file errors.h declares a host of err_* functions to set various
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000141types of exceptions. The most common one is \code{err_setstr()} --- its
142arguments are an exception object (e.g. RuntimeError --- actually it
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000143can be any string object) and a C string indicating the cause of the
144error (this is converted to a string object and stored as the
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000145`associated value' of the exception). Another useful function is
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000146\code{err_errno()}, which only takes an exception argument and
147constructs the associated value by inspection of the (UNIX) global
148variable errno.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000149
150You can test non-destructively whether an exception has been set with
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000151\code{err_occurred()}. However, most code never calls
152\code{err_occurred()} to see whether an error occurred or not, but
153relies on error return values from the functions it calls instead:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000154
155When a function that calls another function detects that the called
156function fails, it should return an error value but not set an
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000157condition --- one is already set. The caller is then supposed to also
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000158return an error indication to *its* caller, again *without* calling
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000159\code{err_setstr()}, and so on --- the most detailed cause of the error
160was already reported by the function that detected it in the first
161place. Once the error has reached Python's interpreter main loop,
162this aborts the currently executing Python code and tries to find an
163exception handler specified by the Python programmer.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000164
165To ignore an exception set by a function call that failed, the
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000166exception condition must be cleared explicitly by calling
167\code{err_clear()}. The only time C code should call
168\code{err_clear()} is if it doesn't want to pass the error on to the
169interpreter but wants to handle it completely by itself (e.g. by
170trying something else or pretending nothing happened).
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000171
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000172Finally, the function \code{err_get()} gives you both error variables
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000173*and clears them*. Note that even if an error occurred the second one
174may be NULL. I doubt you will need to use this function.
175
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000176Note that a failing \code{malloc()} call must also be turned into an
177exception --- the direct caller of \code{malloc()} (or
178\code{realloc()}) must call \code{err_nomem()} and return a failure
179indicator itself. All the object-creating functions
180(\code{newintobject()} etc.) already do this, so only if you call
181\code{malloc()} directly this note is of importance.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000182
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000183Also note that, with the important exception of \code{getargs()}, functions
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000184that return an integer status usually use 0 for success and -1 for
185failure.
186
187Finally, be careful about cleaning up garbage (making appropriate
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000188[\code{X}]\code{DECREF()} calls) when you return an error!
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000189
190
191\section{Back to the example}
192
193Going back to posix_system, you should now be able to understand this
194bit:
195
196\begin{verbatim}
197 if (!getargs(args, "s", &command))
198 return NULL;
199\end{verbatim}
200
201It returns NULL (the error indicator for functions of this kind) if an
202error is detected in the argument list, relying on the exception set
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000203by \code{getargs()}. The string value of the argument is now copied to the
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000204local variable 'command'.
205
206If a Python function is called with multiple arguments, the argument
207list is turned into a tuple. Python programs can us this feature, for
208instance, to explicitly create the tuple containing the arguments
209first and make the call later.
210
211The next statement in posix_system is a call tothe C library function
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000212\code{system()}, passing it the string we just got from \code{getargs()}:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000213
214\begin{verbatim}
215 sts = system(command);
216\end{verbatim}
217
218Python strings may contain internal null bytes; but if these occur in
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000219this example the rest of the string will be ignored by \code{system()}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000220
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000221Finally, posix.\code{system()} must return a value: the integer status
222returned by the C library \code{system()} function. This is done by the
223function \code{newintobject()}, which takes a (long) integer as parameter.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000224
225\begin{verbatim}
226 return newintobject((long)sts);
227\end{verbatim}
228
229(Yes, even integers are represented as objects on the heap in Python!)
230If you had a function that returned no useful argument, you would need
231this idiom:
232
233\begin{verbatim}
234 INCREF(None);
235 return None;
236\end{verbatim}
237
238'None' is a unique Python object representing 'no value'. It differs
239from NULL, which means 'error' in most contexts (except when passed as
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000240a function argument --- there it means 'no arguments').
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000241
242
243\section{The module's function table}
244
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000245I promised to show how I made the function \code{posix_system()}
246available to Python programs. This is shown later in posixmodule.c:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000247
248\begin{verbatim}
249 static struct methodlist posix_methods[] = {
250 ...
251 {"system", posix_system},
252 ...
253 {NULL, NULL} /* Sentinel */
254 };
255
256 void
257 initposix()
258 {
259 (void) initmodule("posix", posix_methods);
260 }
261\end{verbatim}
262
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000263(The actual \code{initposix()} is somewhat more complicated, but most
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000264extension modules are indeed as simple as that.) When the Python
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000265program first imports module 'posix', \code{initposix()} is called,
266which calls \code{initmodule()} with specific parameters. This
267creates a module object (which is inserted in the table sys.modules
268under the key 'posix'), and adds built-in-function objects to the
269newly created module based upon the table (of type struct methodlist)
270that was passed as its second parameter. The function
271\code{initmodule()} returns a pointer to the module object that it
272creates, but this is unused here. It aborts with a fatal error if the
273module could not be initialized satisfactorily.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000274
275
276\section{Calling the module initialization function}
277
278There is one more thing to do: telling the Python module to call the
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000279\code{initfoo()} function when it encounters an 'import foo' statement.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000280This is done in the file config.c. This file contains a table mapping
281module names to parameterless void function pointers. You need to add
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000282a declaration of \code{initfoo()} somewhere early in the file, and a
283line saying
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000284
285\begin{verbatim}
286 {"foo", initfoo},
287\end{verbatim}
288
289to the initializer for inittab[]. It is conventional to include both
290the declaration and the initializer line in preprocessor commands
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000291\code{\#ifdef USE_FOO} / \code{\#endif}, to make it easy to turn the
292foo extension on or off. Note that the Macintosh version uses a
293different configuration file, distributed as configmac.c. This
294strategy may be extended to other operating system versions, although
295usually the standard config.c file gives a pretty useful starting
296point for a new config*.c file.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000297
298And, of course, I forgot the Makefile. This is actually not too hard,
299just follow the examples for, say, AMOEBA. Just find all occurrences
300of the string AMOEBA in the Makefile and do the same for FOO that's
301done for AMOEBA...
302
303(Note: if you are using dynamic loading for your extension, you don't
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000304need to edit config.c and the Makefile. See \file{./DYNLOAD} for more
305info about this.)
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000306
307
308\section{Calling Python functions from C}
309
310The above concentrates on making C functions accessible to the Python
311programmer. The reverse is also often useful: calling Python
312functions from C. This is especially the case for libraries that
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000313support so-called `callback' functions. If a C interface makes heavy
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000314use of callbacks, the equivalent Python often needs to provide a
315callback mechanism to the Python programmer; the implementation may
316require calling the Python callback functions from a C callback.
317Other uses are also possible.
318
319Fortunately, the Python interpreter is easily called recursively, and
320there is a standard interface to call a Python function. I won't
321dwell on how to call the Python parser with a particular string as
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000322input --- if you're interested, have a look at the implementation of
323the \samp{-c} command line option in pythonmain.c.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000324
325Calling a Python function is easy. First, the Python program must
326somehow pass you the Python function object. You should provide a
327function (or some other interface) to do this. When this function is
328called, save a pointer to the Python function object (be careful to
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000329INCREF it!) in a global variable --- or whereever you see fit.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000330For example, the following function might be part of a module
331definition:
332
333\begin{verbatim}
334 static object *my_callback;
335
336 static object *
337 my_set_callback(dummy, arg)
338 object *dummy, *arg;
339 {
340 XDECREF(my_callback); /* Dispose of previous callback */
341 my_callback = arg;
342 XINCREF(my_callback); /* Remember new callback */
343 /* Boilerplate for "void" return */
344 INCREF(None);
345 return None;
346 }
347\end{verbatim}
348
349Later, when it is time to call the function, you call the C function
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000350\code{call_object()}. This function has two arguments, both pointers
351to arbitrary Python objects: the Python function, and the argument.
352The argument can be NULL to call the function without arguments. For
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000353example:
354
355\begin{verbatim}
356 object *result;
357 ...
358 /* Time to call the callback */
359 result = call_object(my_callback, (object *)NULL);
360\end{verbatim}
361
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000362\code{call_object()} returns a Python object pointer: this is
363the return value of the Python function. \code{call_object()} is
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000364`reference-count-neutral' with respect to its arguments, but the
365return value is `new': either it is a brand new object, or it is an
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000366existing object whose reference count has been incremented. So, you
367should somehow apply DECREF to the result, even (especially!) if you
368are not interested in its value.
369
370Before you do this, however, it is important to check that the return
371value isn't NULL. If it is, the Python function terminated by raising
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000372an exception. If the C code that called \code{call_object()} is
373called from Python, it should now return an error indication to its
374Python caller, so the interpreter can print a stack trace, or the
375calling Python code can handle the exception. If this is not possible
376or desirable, the exception should be cleared by calling
377\code{err_clear()}. For example:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000378
379\begin{verbatim}
380 if (result == NULL)
381 return NULL; /* Pass error back */
382 /* Here maybe use the result */
383 DECREF(result);
384\end{verbatim}
385
386Depending on the desired interface to the Python callback function,
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000387you may also have to provide an argument to \code{call_object()}. In
388some cases the argument is also provided by the Python program,
389through the same interface that specified the callback function. It
390can then be saved and used in the same manner as the function object.
391In other cases, you may have to construct a new object to pass as
392argument. In this case you must dispose of it as well. For example,
393if you want to pass an integral event code, you might use the
394following code:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000395
396\begin{verbatim}
397 object *argument;
398 ...
399 argument = newintobject((long)eventcode);
400 result = call_object(my_callback, argument);
401 DECREF(argument);
402 if (result == NULL)
403 return NULL; /* Pass error back */
404 /* Here maybe use the result */
405 DECREF(result);
406\end{verbatim}
407
408Note the placement of DECREF(argument) immediately after the call,
409before the error check! Also note that strictly spoken this code is
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000410not complete: \code{newintobject()} may run out of memory, and this
411should be checked.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000412
413In even more complicated cases you may want to pass the callback
414function multiple arguments. To this end you have to construct (and
415dispose of!) a tuple object. Details (mostly concerned with the
416errror checks and reference count manipulation) are left as an
417exercise for the reader; most of this is also needed when returning
418multiple values from a function.
419
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000420XXX TO DO: explain objects.
421
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000422XXX TO DO: defining new object types.
423
424
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000425\section{Format strings for {\tt getargs()}}
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000426
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000427The \code{getargs()} function is declared in \file{modsupport.h} as
428follows:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000429
430\begin{verbatim}
431 int getargs(object *arg, char *format, ...);
432\end{verbatim}
433
434The remaining arguments must be addresses of variables whose type is
435determined by the format string. For the conversion to succeed, the
436`arg' object must match the format and the format must be exhausted.
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000437Note that while \code{getargs()} checks that the Python object really
438is of the specified type, it cannot check that the addresses provided
439in the call match: if you make mistakes there, your code will probably
440dump core.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000441
442A format string consists of a single `format unit'. A format unit
443describes one Python object; it is usually a single character or a
444parenthesized string. The type of a format units is determined from
445its first character, the `format letter':
446
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000447\begin{description}
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000448
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000449\item[\samp{s} (string)]
450The Python object must be a string object. The C argument must be a
451char** (i.e. the address of a character pointer), and a pointer to
452the C string contained in the Python object is stored into it. If the
453next character in the format string is \samp{\#}, another C argument
454of type int* must be present, and the length of the Python string (not
455counting the trailing zero byte) is stored into it.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000456
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000457\item[\samp{z} (string or zero, i.e. \code{NULL})]
458Like \samp{s}, but the object may also be None. In this case the
459string pointer is set to NULL and if a \samp{\#} is present the size
460it set to 0.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000461
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000462\item[\samp{b} (byte, i.e. char interpreted as tiny int)]
463The object must be a Python integer. The C argument must be a char*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000464
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000465\item[\samp{h} (half, i.e. short)]
466The object must be a Python integer. The C argument must be a short*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000467
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000468\item[\samp{i} (int)]
469The object must be a Python integer. The C argument must be an int*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000470
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000471\item[\samp{l} (long)]
472The object must be a (plain!) Python integer. The C argument must be
473a long*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000474
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000475\item[\samp{c} (char)]
476The Python object must be a string of length 1. The C argument must
477be a char*. (Don't pass an int*!)
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000478
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000479\item[\samp{f} (float)]
480The object must be a Python int or float. The C argument must be a
481float*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000482
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000483\item[\samp{d} (double)]
484The object must be a Python int or float. The C argument must be a
485double*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000486
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000487\item[\samp{S} (string object)]
488The object must be a Python string. The C argument must be an
489object** (i.e. the address of an object pointer). The C program thus
490gets back the actual string object that was passed, not just a pointer
491to its array of characters and its size as for format character
492\samp{s}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000493
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000494\item[\samp{O} (object)]
495The object can be any Python object, including None, but not NULL.
496The C argument must be an object**. This can be used if an argument
497list must contain objects of a type for which no format letter exist:
498the caller must then check that it has the right type.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000499
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000500\item[\samp{(} (tuple)]
501The object must be a Python tuple. Following the \samp{(} character
502in the format string must come a number of format units describing the
503elements of the tuple, followed by a \samp{)} character. Tuple
504format units may be nested. (There are no exceptions for empty and
505singleton tuples; \samp{()} specifies an empty tuple and \samp{(i)} a
506singleton of one integer. Normally you don't want to use the latter,
507since it is hard for the user to specify.
508
509\end{description}
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000510
511More format characters will probably be added as the need arises. It
512should be allowed to use Python long integers whereever integers are
513expected, and perform a range check. (A range check is in fact always
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000514necessary for the \samp{b}, \samp{h} and \samp{i} format
515letters, but this is currently not implemented.)
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000516
517Some example calls:
518
519\begin{verbatim}
520 int ok;
521 int i, j;
522 long k, l;
523 char *s;
524 int size;
525
526 ok = getargs(args, "(lls)", &k, &l, &s); /* Two longs and a string */
527 /* Possible Python call: f(1, 2, 'three') */
528
529 ok = getargs(args, "s", &s); /* A string */
530 /* Possible Python call: f('whoops!') */
531
532 ok = getargs(args, ""); /* No arguments */
533 /* Python call: f() */
534
535 ok = getargs(args, "((ii)s#)", &i, &j, &s, &size);
536 /* A pair of ints and a string, whose size is also returned */
537 /* Possible Python call: f(1, 2, 'three') */
538
539 {
540 int left, top, right, bottom, h, v;
541 ok = getargs(args, "(((ii)(ii))(ii))",
542 &left, &top, &right, &bottom, &h, &v);
543 /* A rectangle and a point */
544 /* Possible Python call:
545 f( ((0, 0), (400, 300)), (10, 10)) */
546 }
547\end{verbatim}
548
549Note that a format string must consist of a single unit; strings like
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000550\samp{is} and \samp{(ii)s\#} are not valid format strings. (But
551\samp{s\#} is.)
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000552
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000553The \code{getargs()} function does not support variable-length
554argument lists. In simple cases you can fake these by trying several
555calls to
556\code{getargs()} until one succeeds, but you must take care to call
557\code{err_clear()} before each retry. For example:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000558
559\begin{verbatim}
560 static object *my_method(self, args) object *self, *args; {
561 int i, j, k;
562
563 if (getargs(args, "(ii)", &i, &j)) {
564 k = 0; /* Use default third argument */
565 }
566 else {
567 err_clear();
568 if (!getargs(args, "(iii)", &i, &j, &k))
569 return NULL;
570 }
571 /* ... use i, j and k here ... */
572 INCREF(None);
573 return None;
574 }
575\end{verbatim}
576
577(It is possible to think of an extension to the definition of format
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000578strings to accomodate this directly, e.g., placing a \samp{|} in a
579tuple might specify that the remaining arguments are optional.
580\code{getargs()} should then return one more than the number of
581variables stored into.)
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000582
583Advanced users note: If you set the `varargs' flag in the method list
584for a function, the argument will always be a tuple (the `raw argument
585list'). In this case you must enclose single and empty argument lists
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000586in parentheses, e.g., \samp{(s)} and \samp{()}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000587
588
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000589\section{The {\tt mkvalue()} function}
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000590
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000591This function is the counterpart to \code{getargs()}. It is declared
592in \file{modsupport.h} as follows:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000593
594\begin{verbatim}
595 object *mkvalue(char *format, ...);
596\end{verbatim}
597
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000598It supports exactly the same format letters as \code{getargs()}, but
599the arguments (which are input to the function, not output) must not
600be pointers, just values. If a byte, short or float is passed to a
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000601varargs function, it is widened by the compiler to int or double, so
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000602\samp{b} and \samp{h} are treated as \samp{i} and \samp{f} is
603treated as \samp{d}. \samp{S} is treated as \samp{O}, \samp{s} is
604treated as \samp{z}. \samp{z\#} and \samp{s\#} are supported: a
605second argument specifies the length of the data (negative means use
606\code{strlen()}). \samp{S} and \samp{O} add a reference to their
607argument (so you should \code{DECREF()} it if you've just created it
608and aren't going to use it again).
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000609
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000610If the argument for \samp{O} or \samp{S} is a NULL pointer, it is
611assumed that this was caused because the call producing the argument
612found an error and set an exception. Therefore, \code{mkvalue()} will
613return \code{NULL} but won't set an exception if one is already set.
614If no exception is set, \code{SystemError} is set.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000615
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000616If there is an error in the format string, the \code{SystemError}
617exception is set, since it is the calling C code's fault, not that of
618the Python user who sees the exception.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000619
620Example:
621
622\begin{verbatim}
623 return mkvalue("(ii)", 0, 0);
624\end{verbatim}
625
626returns a tuple containing two zeros. (Outer parentheses in the
627format string are actually superfluous, but you can use them for
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000628compatibility with \code{getargs()}, which requires them if more than
629one argument is expected.)
630
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000631
632\section{Reference counts}
633
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000634Here's a useful explanation of \code{INCREF()} and \code{DECREF()}
635(after an original by Sjoerd Mullender).
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000636
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000637Use \code{XINCREF()} or \code{XDECREF()} instead of \code{INCREF()} /
638\code{DECREF()} when the argument may be \code{NULL}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000639
640The basic idea is, if you create an extra reference to an object, you
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000641must \code{INCREF()} it, if you throw away a reference to an object,
642you must \code{DECREF()} it. Functions such as
643\code{newstringobject()}, \code{newsizedstringobject()},
644\code{newintobject()}, etc. create a reference to an object. If you
645want to throw away the object thus created, you must use
646\code{DECREF()}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000647
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000648If you put an object into a tuple or list using \code{settupleitem()}
649or \code{setlistitem()}, the idea is that you usually don't want to
650keep a reference of your own around, so Python does not
651\code{INCREF()} the elements. It does \code{DECREF()} the old value.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000652This means that if you put something into such an object using the
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000653functions Python provides for this, you must \code{INCREF()} the
654object if you also want to keep a separate reference to the object around.
655Also, if you replace an element, you should \code{INCREF()} the old
656element first if you want to keep it. If you didn't \code{INCREF()}
657it before you replaced it, you are not allowed to look at it anymore,
658since it may have been freed.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000659
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000660Returning an object to Python (i.e. when your C function returns)
661creates a reference to an object, but it does not change the reference
662count. When your code does not keep another reference to the object,
663you should not \code{INCREF()} or \code{DECREF()} it (assuming it is a
664newly created object). When you do keep a reference around, you
665should \code{INCREF()} the object. Also, when you return a global
666object such as \code{None}, you should \code{INCREF()} it.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000667
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000668If you want to return a tuple, you should consider using
669\code{mkvalue()}. This function creates a new tuple with a reference
670count of 1 which you can return. If any of the elements you put into
671the tuple are objects (format codes \samp{O} or \samp{S}), they
672are \code{INCREF()}'ed by \code{mkvalue()}. If you don't want to keep
673references to those elements around, you should \code{DECREF()} them
674after having called \code{mkvalue()}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000675
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000676Usually you don't have to worry about arguments. They are
677\code{INCREF()}'ed before your function is called and
678\code{DECREF()}'ed after your function returns. When you keep a
679reference to an argument, you should \code{INCREF()} it and
680\code{DECREF()} when you throw it away. Also, when you return an
681argument, you should \code{INCREF()} it, because returning the
682argument creates an extra reference to it.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000683
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000684If you use \code{getargs()} to parse the arguments, you can get a
685reference to an object (by using \samp{O} in the format string). This
686object was not \code{INCREF()}'ed, so you should not \code{DECREF()}
687it. If you want to keep the object, you must \code{INCREF()} it
688yourself.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000689
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000690If you create your own type of objects, you should use \code{NEWOBJ()}
691to create the object. This sets the reference count to 1. If you
692want to throw away the object, you should use \code{DECREF()}. When
693the reference count reaches zero, your type's \code{dealloc()}
694function is called. In it, you should \code{DECREF()} all object to
695which you keep references in your object, but you should not use
696\code{DECREF()} on your object. You should use \code{DEL()} instead.
697
698
699\section{Using C++}
700
701It is possible to write extension modules in C++. Some restrictions
702apply: since the main program (the Python interpreter) is compiled and
703linked by the C compiler, global or static objects with constructors
704cannot be used. All functions that will be called directly or
705indirectly (i.e. via function pointers) by the Python interpreter will
706have to be declared using \code{extern "C"}; this applies to all
707`methods' as well as to the module's initialization function.
708It is unnecessary to enclose the Python header files in
709\code{extern "C" \{...\}} --- they do this already.
710
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000711
712\chapter{Embedding Python in another application}
713
714Embedding Python is similar to extending it, but not quite. The
715difference is that when you extend Python, the main program of the
716application is still the Python interpreter, while of you embed
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000717Python, the main program may have nothing to do with Python ---
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000718instead, some parts of the application occasionally call the Python
719interpreter to run some Python code.
720
721So if you are embedding Python, you are providing your own main
722program. One of the things this main program has to do is initialize
723the Python interpreter. At the very least, you have to call the
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000724function \code{initall()}. There are optional calls to pass command
725line arguments to Python. Then later you can call the interpreter
726from any part of the application.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000727
728There are several different ways to call the interpreter: you can pass
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000729a string containing Python statements to \code{run_command()}, or you
730can pass a stdio file pointer and a file name (for identification in
731error messages only) to \code{run_script()}. You can also call the
732lower-level operations described in the previous chapters to construct
733and use Python objects.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000734
735A simple demo of embedding Python can be found in the directory
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000736\file{<pythonroot>/embed}.
737
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000738
739\section{Using C++}
740
741It is also possible to embed Python in a C++ program; how this is done
742exactly will depend on the details of the C++ system used; in general
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000743you will need to write the main program in C++, and use the C++
744compiler to compile and link your program. There is no need to
745recompile Python itself with C++.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000746
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000747
748\chapter{Dynamic Loading}
749
750On some systems (e.g., SunOS, SGI Irix) it is possible to configure
751Python to support dynamic loading of modules implemented in C. Once
752configured and installed it's trivial to use: if a Python program
753executes \code{import foo}, the search for modules tries to find a
754file \file{foomodule.o} in the module search path, and if one is
755found, it is linked with the executing binary and executed. Once
756linked, the module acts just like a built-in module.
757
758The advantages of dynamic loading are twofold: the `core' Python
759binary gets smaller, and users can extend Python with their own
760modules implemented in C without having to build and maintain their
761own copy of the Python interpreter. There are also disadvantages:
762dynamic loading isn't available on all systems (this just means that
763on some systems you have to use static loading), and dynamically
764loading a module that was compiled for a different version of Python
765(e.g., with a different representation of objects) may dump core.
766
Guido van Rossumfbee23e1994-01-01 17:32:24 +0000767{\bf NEW:} Under SunOS (all versions) and IRIX 5.x, dynamic loading
768now uses shared libraries and is always configured. See at the
769end of this chapter for how to create a dynamically loadable module.
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000770
771
772\section{Configuring and building the interpreter for dynamic loading}
773
Guido van Rossumfbee23e1994-01-01 17:32:24 +0000774(Ignore this section for SunOS and IRIX 5.x --- on these systems
775dynamic loading is always configured.)
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000776
777Dynamic loading is a little complicated to configure, since its
778implementation is extremely system dependent, and there are no
779really standard libraries or interfaces for it. I'm using an
780extremely simple interface, which basically needs only one function:
781
782\begin{verbatim}
783 funcptr = dl_loadmod(binary, object, function)
784\end{verbatim}
785
786where \code{binary} is the pathname of the currently executing program
787(not just \code{argv[0]}!), \code{object} is the name of the \samp{.o}
788file to be dynamically loaded, and \code{function} is the name of a
789function in the module. If the dynamic loading succeeds,
790\code{dl_loadmod()} returns a pointer to the named function; if not, it
791returns \code{NULL}.
792
793I provide two implementations of \code{dl_loadmod()}: one for SGI machines
794running Irix 4.0 (written by my colleague Jack Jansen), and one that
795is a thin interface layer for Wilson Ho's (GNU) dynamic loading
796package \dfn{dld} (version 3.2.3). Dld implements a much more powerful
797version of dynamic loading than needed (including unlinking), but it
798does not support System V's COFF object file format. It currently
799supports only VAX (Ultrix), Sun 3 (SunOS 3.4 and 4.0), SPARCstation
800(SunOS 4.0), Sequent Symmetry (Dynix), and Atari ST (from the dld
8013.2.3 README file). Dld is part of the standard Python distribution;
802if you didn't get it,many ftp archive sites carry dld these days, so
803it won't be hard to get hold of it if you need it (using archie).
804
805(If you don't know where to get dld, try anonymous ftp to
806\file{wuarchive.wustl.edu:/mirrors2/gnu/dld-3.2.3.tar.Z}. Jack's dld
807can be found at \file{ftp.cwi.nl:/pub/python/dl.tar.Z}.)
808
809To build a Python interpreter capable of dynamic loading, you need to
810edit the Makefile. Basically you must uncomment the lines starting
811with \samp{\#DL_}, but you must also edit some of the lines to choose
812which version of dl_loadmod to use, and fill in the pathname of the dld
813library if you use it. And, of course, you must first build
814dl_loadmod and dld, if used. (This is now done through the Configure
Guido van Rossumfbee23e1994-01-01 17:32:24 +0000815script. For SunOS and IRIX 5.x, everything is now automatic.)
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000816
817
818\section{Building a dynamically loadable module}
819
820Building an object file usable by dynamic loading is easy, if you
821follow these rules (substitute your module name for \code{foo}
822everywhere):
823
824\begin{itemize}
825
826\item
827The source filename must be \file{foomodule.c}, so the object
828name is \file{foomodule.o}.
829
830\item
831The module must be written as a (statically linked) Python extension
832module (described in an earlier chapter) except that no line for it
833must be added to \file{config.c} and it mustn't be linked with the
834main Python interpreter.
835
836\item
837The module's initialization function must be called \code{initfoo}; it
838must install the module in \code{sys.modules} (generally by calling
839\code{initmodule()} as explained earlier.
840
841\item
842The module must be compiled with \samp{-c}. The resulting .o file must
843not be stripped.
844
845\item
846Since the module must include many standard Python include files, it
847must be compiled with a \samp{-I} option pointing to the Python source
848directory (unless it resides there itself).
849
850\item
851On SGI Irix, the compiler flag \samp{-G0} (or \samp{-G 0}) must be passed.
852IF THIS IS NOT DONE THE RESULTING CODE WILL NOT WORK.
853
854\item
Guido van Rossumfbee23e1994-01-01 17:32:24 +0000855{\bf NEW:} On SunOS and IRIX 5.x, you must create a shared library
856from your \samp{.o} file using the following command (assuming your
857module is called \code{foo}):
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000858
859\begin{verbatim}
860 ld -o foomodule.so foomodule.o <any other libraries needed>
861\end{verbatim}
862
863and place the resulting \samp{.so} file in the Python search path (not
864the \samp{.o} file). Note: on Solaris, you need to pass \samp{-G} to
Guido van Rossumfbee23e1994-01-01 17:32:24 +0000865the loader; on IRIX 5.x, you need to pass \samp{-shared}. Sigh...
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000866
867\end{itemize}
868
869
870\section{Using libraries}
871
872If your dynamically loadable module needs to be linked with one or
873more libraries that aren't linked with Python (or if it needs a
874routine that isn't used by Python from one of the libraries with which
875Python is linked), you must specify a list of libraries to search
876after loading the module in a file with extension \samp{.libs} (and
877otherwise the same as your \samp{.o} file). This file should contain
878one or more lines containing whitespace-separated absolute library
879pathnames. When using the dl interface, \samp{-l...} flags may also
880be used (it is in fact passed as an option list to the system linker
881ld(1)), but the dl-dld interface requires absolute pathnames. I
882believe it is possible to specify shared libraries here.
883
884(On SunOS, any extra libraries must be specified on the \code{ld}
885command that creates the \samp{.so} file.)
886
887
888\section{Caveats}
889
890Dynamic loading requires that \code{main}'s \code{argv[0]} contains
891the pathname or at least filename of the Python interpreter.
892Unfortunately, when executing a directly executable Python script (an
893executable file with \samp{\#!...} on the first line), the kernel
894overwrites \code{argv[0]} with the name of the script. There is no
895easy way around this, so executable Python scripts cannot use
896dynamically loaded modules. (You can always write a simple shell
897script that calls the Python interpreter with the script as its
898input.)
899
900When using dl, the overlay is first converted into an `overlay' for
901the current process by the system linker (\code{ld}). The overlay is
902saved as a file with extension \samp{.ld}, either in the directory
903where the \samp{.o} file lives or (if that can't be written) in a
904temporary directory. An existing \samp{.ld} file resulting from a
905previous run (not from a temporary directory) is used, bypassing the
906(costly) linking phase, provided its version matches the \samp{.o}
907file and the current binary. (See the \code{dl} man page for more
908details.)
909
910
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000911\input{ext.ind}
912
913\end{document}