blob: f89f436818abddcda7eff1591fa73bc1b20d359a [file] [log] [blame]
Guido van Rossumdb65a6c1993-11-05 17:11:16 +00001\documentstyle[twoside,11pt,myformat,times]{report}
Guido van Rossum7a2dba21993-11-05 14:45:11 +00002
3\title{\bf Extending and Embedding the Python Interpreter}
4
5\author{
6 Guido van Rossum \\
Guido van Rossumdb65a6c1993-11-05 17:11:16 +00007 Dept. CST, CWI, P.O. Box 94079 \\
8 1090 GB Amsterdam, The Netherlands \\
Guido van Rossum7a2dba21993-11-05 14:45:11 +00009 E-mail: {\tt guido@cwi.nl}
10}
11
12% Tell \index to actually write the .idx file
13\makeindex
14
15\begin{document}
16
17\pagenumbering{roman}
18
19\maketitle
20
21\begin{abstract}
22
23\noindent
Guido van Rossum6f0132f1993-11-19 13:13:22 +000024This document describes how to write modules in C or C++ to extend the
25Python interpreter. It also describes how to use Python as an
26`embedded' language, and how extension modules can be loaded
27dynamically (at run time) into the interpreter, if the operating
28system supports this feature.
Guido van Rossum7a2dba21993-11-05 14:45:11 +000029
30\end{abstract}
31
32\pagebreak
33
34{
35\parskip = 0mm
36\tableofcontents
37}
38
39\pagebreak
40
41\pagenumbering{arabic}
42
Guido van Rossumdb65a6c1993-11-05 17:11:16 +000043
Guido van Rossum7a2dba21993-11-05 14:45:11 +000044\chapter{Extending Python with C or C++ code}
45
Guido van Rossum6f0132f1993-11-19 13:13:22 +000046
47\section{Introduction}
48
Guido van Rossum7a2dba21993-11-05 14:45:11 +000049It is quite easy to add non-standard built-in modules to Python, if
50you know how to program in C. A built-in module known to the Python
Guido van Rossum6f0132f1993-11-19 13:13:22 +000051programmer as \code{foo} is generally implemented by a file called
52\file{foomodule.c}. All but the most essential standard built-in
53modules also adhere to this convention, and in fact some of them form
54excellent examples of how to create an extension.
Guido van Rossum7a2dba21993-11-05 14:45:11 +000055
56Extension modules can do two things that can't be done directly in
Guido van Rossum6f0132f1993-11-19 13:13:22 +000057Python: they can implement new data types, and they can make system
58calls or call C library functions. Since the latter is usually the
59most important reason for adding an extension, I'll concentrate on
60adding `wrappers' around C library functions; the concrete example
61uses the wrapper for
62\code{system()} in module \code{posix}, found in (of course) the file
63\file{posixmodule.c}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +000064
65It is important not to be impressed by the size and complexity of
66the average extension module; much of this is straightforward
Guido van Rossum6f0132f1993-11-19 13:13:22 +000067`boilerplate' code (starting right with the copyright notice)!
Guido van Rossum7a2dba21993-11-05 14:45:11 +000068
Guido van Rossum6f0132f1993-11-19 13:13:22 +000069Let's skip the boilerplate and have a look at an interesting function
70in \file{posixmodule.c} first:
Guido van Rossum7a2dba21993-11-05 14:45:11 +000071
72\begin{verbatim}
73 static object *
74 posix_system(self, args)
75 object *self;
76 object *args;
77 {
78 char *command;
79 int sts;
80 if (!getargs(args, "s", &command))
81 return NULL;
82 sts = system(command);
Guido van Rossum6f0132f1993-11-19 13:13:22 +000083 return mkvalue("i", sts);
Guido van Rossum7a2dba21993-11-05 14:45:11 +000084 }
85\end{verbatim}
86
87This is the prototypical top-level function in an extension module.
88It will be called (we'll see later how this is made possible) when the
89Python program executes statements like
90
91\begin{verbatim}
92 >>> import posix
93 >>> sts = posix.system('ls -l')
94\end{verbatim}
95
96There is a straightforward translation from the arguments to the call
Guido van Rossum6f0132f1993-11-19 13:13:22 +000097in Python (here the single value \code{'ls -l'}) to the arguments that
98are passed to the C function. The C function always has two
99parameters, conventionally named \var{self} and \var{args}. In this
100example, \var{self} will always be a \code{NULL} pointer, since this is a
101function, not a method (this is done so that the interpreter doesn't
102have to understand two different types of C functions).
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000103
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000104The \var{args} parameter will be a pointer to a Python object, or
105\code{NULL} if the Python function/method was called without
106arguments. It is necessary to do full argument type checking on each
107call, since otherwise the Python user would be able to cause the
108Python interpreter to `dump core' by passing the wrong arguments to a
109function in an extension module (or no arguments at all). Because
110argument checking and converting arguments to C is such a common task,
111there's a general function in the Python interpreter which combines
112these tasks: \code{getargs()}. It uses a template string to determine
113both the types of the Python argument and the types of the C variables
114into which it should store the converted values. (More about this
115later.)\footnote{
116There are convenience macros \code{getstrarg()},
117\code{getintarg()}, etc., for many common forms of \code{getargs()}
118templates. These are relics from the past; it's better to call
119\code{getargs()} directly.}
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000120
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000121If \code{getargs()} returns nonzero, the argument list has the right
122type and its components have been stored in the variables whose
123addresses are passed. If it returns zero, an error has occurred. In
124the latter case it has already raised an appropriate exception by
125calling \code{err_setstr()}, so the calling function can just return
126\code{NULL}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000127
128
129\section{Intermezzo: errors and exceptions}
130
131An important convention throughout the Python interpreter is the
132following: when a function fails, it should set an exception condition
133and return an error value (often a NULL pointer). Exceptions are set
134in a global variable in the file errors.c; if this variable is NULL no
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000135exception has occurred. A second variable is the `associated value'
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000136of the exception.
137
138The file errors.h declares a host of err_* functions to set various
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000139types of exceptions. The most common one is \code{err_setstr()} --- its
140arguments are an exception object (e.g. RuntimeError --- actually it
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000141can be any string object) and a C string indicating the cause of the
142error (this is converted to a string object and stored as the
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000143`associated value' of the exception). Another useful function is
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000144\code{err_errno()}, which only takes an exception argument and
145constructs the associated value by inspection of the (UNIX) global
146variable errno.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000147
148You can test non-destructively whether an exception has been set with
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000149\code{err_occurred()}. However, most code never calls
150\code{err_occurred()} to see whether an error occurred or not, but
151relies on error return values from the functions it calls instead:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000152
153When a function that calls another function detects that the called
154function fails, it should return an error value but not set an
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000155condition --- one is already set. The caller is then supposed to also
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000156return an error indication to *its* caller, again *without* calling
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000157\code{err_setstr()}, and so on --- the most detailed cause of the error
158was already reported by the function that detected it in the first
159place. Once the error has reached Python's interpreter main loop,
160this aborts the currently executing Python code and tries to find an
161exception handler specified by the Python programmer.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000162
163To ignore an exception set by a function call that failed, the
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000164exception condition must be cleared explicitly by calling
165\code{err_clear()}. The only time C code should call
166\code{err_clear()} is if it doesn't want to pass the error on to the
167interpreter but wants to handle it completely by itself (e.g. by
168trying something else or pretending nothing happened).
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000169
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000170Finally, the function \code{err_get()} gives you both error variables
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000171*and clears them*. Note that even if an error occurred the second one
172may be NULL. I doubt you will need to use this function.
173
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000174Note that a failing \code{malloc()} call must also be turned into an
175exception --- the direct caller of \code{malloc()} (or
176\code{realloc()}) must call \code{err_nomem()} and return a failure
177indicator itself. All the object-creating functions
178(\code{newintobject()} etc.) already do this, so only if you call
179\code{malloc()} directly this note is of importance.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000180
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000181Also note that, with the important exception of \code{getargs()}, functions
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000182that return an integer status usually use 0 for success and -1 for
183failure.
184
185Finally, be careful about cleaning up garbage (making appropriate
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000186[\code{X}]\code{DECREF()} calls) when you return an error!
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000187
188
189\section{Back to the example}
190
191Going back to posix_system, you should now be able to understand this
192bit:
193
194\begin{verbatim}
195 if (!getargs(args, "s", &command))
196 return NULL;
197\end{verbatim}
198
199It returns NULL (the error indicator for functions of this kind) if an
200error is detected in the argument list, relying on the exception set
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000201by \code{getargs()}. The string value of the argument is now copied to the
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000202local variable 'command'.
203
204If a Python function is called with multiple arguments, the argument
205list is turned into a tuple. Python programs can us this feature, for
206instance, to explicitly create the tuple containing the arguments
207first and make the call later.
208
209The next statement in posix_system is a call tothe C library function
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000210\code{system()}, passing it the string we just got from \code{getargs()}:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000211
212\begin{verbatim}
213 sts = system(command);
214\end{verbatim}
215
216Python strings may contain internal null bytes; but if these occur in
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000217this example the rest of the string will be ignored by \code{system()}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000218
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000219Finally, posix.\code{system()} must return a value: the integer status
220returned by the C library \code{system()} function. This is done by the
221function \code{newintobject()}, which takes a (long) integer as parameter.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000222
223\begin{verbatim}
224 return newintobject((long)sts);
225\end{verbatim}
226
227(Yes, even integers are represented as objects on the heap in Python!)
228If you had a function that returned no useful argument, you would need
229this idiom:
230
231\begin{verbatim}
232 INCREF(None);
233 return None;
234\end{verbatim}
235
236'None' is a unique Python object representing 'no value'. It differs
237from NULL, which means 'error' in most contexts (except when passed as
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000238a function argument --- there it means 'no arguments').
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000239
240
241\section{The module's function table}
242
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000243I promised to show how I made the function \code{posix_system()}
244available to Python programs. This is shown later in posixmodule.c:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000245
246\begin{verbatim}
247 static struct methodlist posix_methods[] = {
248 ...
249 {"system", posix_system},
250 ...
251 {NULL, NULL} /* Sentinel */
252 };
253
254 void
255 initposix()
256 {
257 (void) initmodule("posix", posix_methods);
258 }
259\end{verbatim}
260
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000261(The actual \code{initposix()} is somewhat more complicated, but most
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000262extension modules are indeed as simple as that.) When the Python
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000263program first imports module 'posix', \code{initposix()} is called,
264which calls \code{initmodule()} with specific parameters. This
265creates a module object (which is inserted in the table sys.modules
266under the key 'posix'), and adds built-in-function objects to the
267newly created module based upon the table (of type struct methodlist)
268that was passed as its second parameter. The function
269\code{initmodule()} returns a pointer to the module object that it
270creates, but this is unused here. It aborts with a fatal error if the
271module could not be initialized satisfactorily.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000272
273
274\section{Calling the module initialization function}
275
276There is one more thing to do: telling the Python module to call the
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000277\code{initfoo()} function when it encounters an 'import foo' statement.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000278This is done in the file config.c. This file contains a table mapping
279module names to parameterless void function pointers. You need to add
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000280a declaration of \code{initfoo()} somewhere early in the file, and a
281line saying
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000282
283\begin{verbatim}
284 {"foo", initfoo},
285\end{verbatim}
286
287to the initializer for inittab[]. It is conventional to include both
288the declaration and the initializer line in preprocessor commands
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000289\code{\#ifdef USE_FOO} / \code{\#endif}, to make it easy to turn the
290foo extension on or off. Note that the Macintosh version uses a
291different configuration file, distributed as configmac.c. This
292strategy may be extended to other operating system versions, although
293usually the standard config.c file gives a pretty useful starting
294point for a new config*.c file.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000295
296And, of course, I forgot the Makefile. This is actually not too hard,
297just follow the examples for, say, AMOEBA. Just find all occurrences
298of the string AMOEBA in the Makefile and do the same for FOO that's
299done for AMOEBA...
300
301(Note: if you are using dynamic loading for your extension, you don't
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000302need to edit config.c and the Makefile. See \file{./DYNLOAD} for more
303info about this.)
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000304
305
306\section{Calling Python functions from C}
307
308The above concentrates on making C functions accessible to the Python
309programmer. The reverse is also often useful: calling Python
310functions from C. This is especially the case for libraries that
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000311support so-called `callback' functions. If a C interface makes heavy
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000312use of callbacks, the equivalent Python often needs to provide a
313callback mechanism to the Python programmer; the implementation may
314require calling the Python callback functions from a C callback.
315Other uses are also possible.
316
317Fortunately, the Python interpreter is easily called recursively, and
318there is a standard interface to call a Python function. I won't
319dwell on how to call the Python parser with a particular string as
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000320input --- if you're interested, have a look at the implementation of
321the \samp{-c} command line option in pythonmain.c.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000322
323Calling a Python function is easy. First, the Python program must
324somehow pass you the Python function object. You should provide a
325function (or some other interface) to do this. When this function is
326called, save a pointer to the Python function object (be careful to
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000327INCREF it!) in a global variable --- or whereever you see fit.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000328For example, the following function might be part of a module
329definition:
330
331\begin{verbatim}
332 static object *my_callback;
333
334 static object *
335 my_set_callback(dummy, arg)
336 object *dummy, *arg;
337 {
338 XDECREF(my_callback); /* Dispose of previous callback */
339 my_callback = arg;
340 XINCREF(my_callback); /* Remember new callback */
341 /* Boilerplate for "void" return */
342 INCREF(None);
343 return None;
344 }
345\end{verbatim}
346
347Later, when it is time to call the function, you call the C function
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000348\code{call_object()}. This function has two arguments, both pointers
349to arbitrary Python objects: the Python function, and the argument.
350The argument can be NULL to call the function without arguments. For
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000351example:
352
353\begin{verbatim}
354 object *result;
355 ...
356 /* Time to call the callback */
357 result = call_object(my_callback, (object *)NULL);
358\end{verbatim}
359
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000360\code{call_object()} returns a Python object pointer: this is
361the return value of the Python function. \code{call_object()} is
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000362`reference-count-neutral' with respect to its arguments, but the
363return value is `new': either it is a brand new object, or it is an
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000364existing object whose reference count has been incremented. So, you
365should somehow apply DECREF to the result, even (especially!) if you
366are not interested in its value.
367
368Before you do this, however, it is important to check that the return
369value isn't NULL. If it is, the Python function terminated by raising
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000370an exception. If the C code that called \code{call_object()} is
371called from Python, it should now return an error indication to its
372Python caller, so the interpreter can print a stack trace, or the
373calling Python code can handle the exception. If this is not possible
374or desirable, the exception should be cleared by calling
375\code{err_clear()}. For example:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000376
377\begin{verbatim}
378 if (result == NULL)
379 return NULL; /* Pass error back */
380 /* Here maybe use the result */
381 DECREF(result);
382\end{verbatim}
383
384Depending on the desired interface to the Python callback function,
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000385you may also have to provide an argument to \code{call_object()}. In
386some cases the argument is also provided by the Python program,
387through the same interface that specified the callback function. It
388can then be saved and used in the same manner as the function object.
389In other cases, you may have to construct a new object to pass as
390argument. In this case you must dispose of it as well. For example,
391if you want to pass an integral event code, you might use the
392following code:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000393
394\begin{verbatim}
395 object *argument;
396 ...
397 argument = newintobject((long)eventcode);
398 result = call_object(my_callback, argument);
399 DECREF(argument);
400 if (result == NULL)
401 return NULL; /* Pass error back */
402 /* Here maybe use the result */
403 DECREF(result);
404\end{verbatim}
405
406Note the placement of DECREF(argument) immediately after the call,
407before the error check! Also note that strictly spoken this code is
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000408not complete: \code{newintobject()} may run out of memory, and this
409should be checked.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000410
411In even more complicated cases you may want to pass the callback
412function multiple arguments. To this end you have to construct (and
413dispose of!) a tuple object. Details (mostly concerned with the
414errror checks and reference count manipulation) are left as an
415exercise for the reader; most of this is also needed when returning
416multiple values from a function.
417
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000418XXX TO DO: explain objects.
419
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000420XXX TO DO: defining new object types.
421
422
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000423\section{Format strings for {\tt getargs()}}
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000424
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000425The \code{getargs()} function is declared in \file{modsupport.h} as
426follows:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000427
428\begin{verbatim}
429 int getargs(object *arg, char *format, ...);
430\end{verbatim}
431
432The remaining arguments must be addresses of variables whose type is
433determined by the format string. For the conversion to succeed, the
434`arg' object must match the format and the format must be exhausted.
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000435Note that while \code{getargs()} checks that the Python object really
436is of the specified type, it cannot check that the addresses provided
437in the call match: if you make mistakes there, your code will probably
438dump core.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000439
440A format string consists of a single `format unit'. A format unit
441describes one Python object; it is usually a single character or a
442parenthesized string. The type of a format units is determined from
443its first character, the `format letter':
444
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000445\begin{description}
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000446
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000447\item[\samp{s} (string)]
448The Python object must be a string object. The C argument must be a
449char** (i.e. the address of a character pointer), and a pointer to
450the C string contained in the Python object is stored into it. If the
451next character in the format string is \samp{\#}, another C argument
452of type int* must be present, and the length of the Python string (not
453counting the trailing zero byte) is stored into it.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000454
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000455\item[\samp{z} (string or zero, i.e. \code{NULL})]
456Like \samp{s}, but the object may also be None. In this case the
457string pointer is set to NULL and if a \samp{\#} is present the size
458it set to 0.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000459
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000460\item[\samp{b} (byte, i.e. char interpreted as tiny int)]
461The object must be a Python integer. The C argument must be a char*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000462
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000463\item[\samp{h} (half, i.e. short)]
464The object must be a Python integer. The C argument must be a short*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000465
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000466\item[\samp{i} (int)]
467The object must be a Python integer. The C argument must be an int*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000468
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000469\item[\samp{l} (long)]
470The object must be a (plain!) Python integer. The C argument must be
471a long*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000472
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000473\item[\samp{c} (char)]
474The Python object must be a string of length 1. The C argument must
475be a char*. (Don't pass an int*!)
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000476
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000477\item[\samp{f} (float)]
478The object must be a Python int or float. The C argument must be a
479float*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000480
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000481\item[\samp{d} (double)]
482The object must be a Python int or float. The C argument must be a
483double*.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000484
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000485\item[\samp{S} (string object)]
486The object must be a Python string. The C argument must be an
487object** (i.e. the address of an object pointer). The C program thus
488gets back the actual string object that was passed, not just a pointer
489to its array of characters and its size as for format character
490\samp{s}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000491
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000492\item[\samp{O} (object)]
493The object can be any Python object, including None, but not NULL.
494The C argument must be an object**. This can be used if an argument
495list must contain objects of a type for which no format letter exist:
496the caller must then check that it has the right type.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000497
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000498\item[\samp{(} (tuple)]
499The object must be a Python tuple. Following the \samp{(} character
500in the format string must come a number of format units describing the
501elements of the tuple, followed by a \samp{)} character. Tuple
502format units may be nested. (There are no exceptions for empty and
503singleton tuples; \samp{()} specifies an empty tuple and \samp{(i)} a
504singleton of one integer. Normally you don't want to use the latter,
505since it is hard for the user to specify.
506
507\end{description}
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000508
509More format characters will probably be added as the need arises. It
510should be allowed to use Python long integers whereever integers are
511expected, and perform a range check. (A range check is in fact always
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000512necessary for the \samp{b}, \samp{h} and \samp{i} format
513letters, but this is currently not implemented.)
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000514
515Some example calls:
516
517\begin{verbatim}
518 int ok;
519 int i, j;
520 long k, l;
521 char *s;
522 int size;
523
524 ok = getargs(args, "(lls)", &k, &l, &s); /* Two longs and a string */
525 /* Possible Python call: f(1, 2, 'three') */
526
527 ok = getargs(args, "s", &s); /* A string */
528 /* Possible Python call: f('whoops!') */
529
530 ok = getargs(args, ""); /* No arguments */
531 /* Python call: f() */
532
533 ok = getargs(args, "((ii)s#)", &i, &j, &s, &size);
534 /* A pair of ints and a string, whose size is also returned */
535 /* Possible Python call: f(1, 2, 'three') */
536
537 {
538 int left, top, right, bottom, h, v;
539 ok = getargs(args, "(((ii)(ii))(ii))",
540 &left, &top, &right, &bottom, &h, &v);
541 /* A rectangle and a point */
542 /* Possible Python call:
543 f( ((0, 0), (400, 300)), (10, 10)) */
544 }
545\end{verbatim}
546
547Note that a format string must consist of a single unit; strings like
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000548\samp{is} and \samp{(ii)s\#} are not valid format strings. (But
549\samp{s\#} is.)
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000550
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000551The \code{getargs()} function does not support variable-length
552argument lists. In simple cases you can fake these by trying several
553calls to
554\code{getargs()} until one succeeds, but you must take care to call
555\code{err_clear()} before each retry. For example:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000556
557\begin{verbatim}
558 static object *my_method(self, args) object *self, *args; {
559 int i, j, k;
560
561 if (getargs(args, "(ii)", &i, &j)) {
562 k = 0; /* Use default third argument */
563 }
564 else {
565 err_clear();
566 if (!getargs(args, "(iii)", &i, &j, &k))
567 return NULL;
568 }
569 /* ... use i, j and k here ... */
570 INCREF(None);
571 return None;
572 }
573\end{verbatim}
574
575(It is possible to think of an extension to the definition of format
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000576strings to accomodate this directly, e.g., placing a \samp{|} in a
577tuple might specify that the remaining arguments are optional.
578\code{getargs()} should then return one more than the number of
579variables stored into.)
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000580
581Advanced users note: If you set the `varargs' flag in the method list
582for a function, the argument will always be a tuple (the `raw argument
583list'). In this case you must enclose single and empty argument lists
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000584in parentheses, e.g., \samp{(s)} and \samp{()}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000585
586
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000587\section{The {\tt mkvalue()} function}
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000588
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000589This function is the counterpart to \code{getargs()}. It is declared
590in \file{modsupport.h} as follows:
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000591
592\begin{verbatim}
593 object *mkvalue(char *format, ...);
594\end{verbatim}
595
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000596It supports exactly the same format letters as \code{getargs()}, but
597the arguments (which are input to the function, not output) must not
598be pointers, just values. If a byte, short or float is passed to a
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000599varargs function, it is widened by the compiler to int or double, so
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000600\samp{b} and \samp{h} are treated as \samp{i} and \samp{f} is
601treated as \samp{d}. \samp{S} is treated as \samp{O}, \samp{s} is
602treated as \samp{z}. \samp{z\#} and \samp{s\#} are supported: a
603second argument specifies the length of the data (negative means use
604\code{strlen()}). \samp{S} and \samp{O} add a reference to their
605argument (so you should \code{DECREF()} it if you've just created it
606and aren't going to use it again).
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000607
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000608If the argument for \samp{O} or \samp{S} is a NULL pointer, it is
609assumed that this was caused because the call producing the argument
610found an error and set an exception. Therefore, \code{mkvalue()} will
611return \code{NULL} but won't set an exception if one is already set.
612If no exception is set, \code{SystemError} is set.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000613
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000614If there is an error in the format string, the \code{SystemError}
615exception is set, since it is the calling C code's fault, not that of
616the Python user who sees the exception.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000617
618Example:
619
620\begin{verbatim}
621 return mkvalue("(ii)", 0, 0);
622\end{verbatim}
623
624returns a tuple containing two zeros. (Outer parentheses in the
625format string are actually superfluous, but you can use them for
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000626compatibility with \code{getargs()}, which requires them if more than
627one argument is expected.)
628
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000629
630\section{Reference counts}
631
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000632Here's a useful explanation of \code{INCREF()} and \code{DECREF()}
633(after an original by Sjoerd Mullender).
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000634
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000635Use \code{XINCREF()} or \code{XDECREF()} instead of \code{INCREF()} /
636\code{DECREF()} when the argument may be \code{NULL}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000637
638The basic idea is, if you create an extra reference to an object, you
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000639must \code{INCREF()} it, if you throw away a reference to an object,
640you must \code{DECREF()} it. Functions such as
641\code{newstringobject()}, \code{newsizedstringobject()},
642\code{newintobject()}, etc. create a reference to an object. If you
643want to throw away the object thus created, you must use
644\code{DECREF()}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000645
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000646If you put an object into a tuple or list using \code{settupleitem()}
647or \code{setlistitem()}, the idea is that you usually don't want to
648keep a reference of your own around, so Python does not
649\code{INCREF()} the elements. It does \code{DECREF()} the old value.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000650This means that if you put something into such an object using the
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000651functions Python provides for this, you must \code{INCREF()} the
652object if you also want to keep a separate reference to the object around.
653Also, if you replace an element, you should \code{INCREF()} the old
654element first if you want to keep it. If you didn't \code{INCREF()}
655it before you replaced it, you are not allowed to look at it anymore,
656since it may have been freed.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000657
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000658Returning an object to Python (i.e. when your C function returns)
659creates a reference to an object, but it does not change the reference
660count. When your code does not keep another reference to the object,
661you should not \code{INCREF()} or \code{DECREF()} it (assuming it is a
662newly created object). When you do keep a reference around, you
663should \code{INCREF()} the object. Also, when you return a global
664object such as \code{None}, you should \code{INCREF()} it.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000665
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000666If you want to return a tuple, you should consider using
667\code{mkvalue()}. This function creates a new tuple with a reference
668count of 1 which you can return. If any of the elements you put into
669the tuple are objects (format codes \samp{O} or \samp{S}), they
670are \code{INCREF()}'ed by \code{mkvalue()}. If you don't want to keep
671references to those elements around, you should \code{DECREF()} them
672after having called \code{mkvalue()}.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000673
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000674Usually you don't have to worry about arguments. They are
675\code{INCREF()}'ed before your function is called and
676\code{DECREF()}'ed after your function returns. When you keep a
677reference to an argument, you should \code{INCREF()} it and
678\code{DECREF()} when you throw it away. Also, when you return an
679argument, you should \code{INCREF()} it, because returning the
680argument creates an extra reference to it.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000681
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000682If you use \code{getargs()} to parse the arguments, you can get a
683reference to an object (by using \samp{O} in the format string). This
684object was not \code{INCREF()}'ed, so you should not \code{DECREF()}
685it. If you want to keep the object, you must \code{INCREF()} it
686yourself.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000687
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000688If you create your own type of objects, you should use \code{NEWOBJ()}
689to create the object. This sets the reference count to 1. If you
690want to throw away the object, you should use \code{DECREF()}. When
691the reference count reaches zero, your type's \code{dealloc()}
692function is called. In it, you should \code{DECREF()} all object to
693which you keep references in your object, but you should not use
694\code{DECREF()} on your object. You should use \code{DEL()} instead.
695
696
697\section{Using C++}
698
699It is possible to write extension modules in C++. Some restrictions
700apply: since the main program (the Python interpreter) is compiled and
701linked by the C compiler, global or static objects with constructors
702cannot be used. All functions that will be called directly or
703indirectly (i.e. via function pointers) by the Python interpreter will
704have to be declared using \code{extern "C"}; this applies to all
705`methods' as well as to the module's initialization function.
706It is unnecessary to enclose the Python header files in
707\code{extern "C" \{...\}} --- they do this already.
708
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000709
710\chapter{Embedding Python in another application}
711
712Embedding Python is similar to extending it, but not quite. The
713difference is that when you extend Python, the main program of the
714application is still the Python interpreter, while of you embed
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000715Python, the main program may have nothing to do with Python ---
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000716instead, some parts of the application occasionally call the Python
717interpreter to run some Python code.
718
719So if you are embedding Python, you are providing your own main
720program. One of the things this main program has to do is initialize
721the Python interpreter. At the very least, you have to call the
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000722function \code{initall()}. There are optional calls to pass command
723line arguments to Python. Then later you can call the interpreter
724from any part of the application.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000725
726There are several different ways to call the interpreter: you can pass
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000727a string containing Python statements to \code{run_command()}, or you
728can pass a stdio file pointer and a file name (for identification in
729error messages only) to \code{run_script()}. You can also call the
730lower-level operations described in the previous chapters to construct
731and use Python objects.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000732
733A simple demo of embedding Python can be found in the directory
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000734\file{<pythonroot>/embed}.
735
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000736
737\section{Using C++}
738
739It is also possible to embed Python in a C++ program; how this is done
740exactly will depend on the details of the C++ system used; in general
Guido van Rossumdb65a6c1993-11-05 17:11:16 +0000741you will need to write the main program in C++, and use the C++
742compiler to compile and link your program. There is no need to
743recompile Python itself with C++.
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000744
Guido van Rossum6f0132f1993-11-19 13:13:22 +0000745
746\chapter{Dynamic Loading}
747
748On some systems (e.g., SunOS, SGI Irix) it is possible to configure
749Python to support dynamic loading of modules implemented in C. Once
750configured and installed it's trivial to use: if a Python program
751executes \code{import foo}, the search for modules tries to find a
752file \file{foomodule.o} in the module search path, and if one is
753found, it is linked with the executing binary and executed. Once
754linked, the module acts just like a built-in module.
755
756The advantages of dynamic loading are twofold: the `core' Python
757binary gets smaller, and users can extend Python with their own
758modules implemented in C without having to build and maintain their
759own copy of the Python interpreter. There are also disadvantages:
760dynamic loading isn't available on all systems (this just means that
761on some systems you have to use static loading), and dynamically
762loading a module that was compiled for a different version of Python
763(e.g., with a different representation of objects) may dump core.
764
765{\bf NEW:} Under SunOS, dynamic loading now uses SunOS shared
766libraries and is always configured. See at the end of this chapter
767for how to create a dynamically loadable module.
768
769
770\section{Configuring and building the interpreter for dynamic loading}
771
772(Ignore this section for SunOS --- on SunOS dynamic loading is always
773configured.)
774
775Dynamic loading is a little complicated to configure, since its
776implementation is extremely system dependent, and there are no
777really standard libraries or interfaces for it. I'm using an
778extremely simple interface, which basically needs only one function:
779
780\begin{verbatim}
781 funcptr = dl_loadmod(binary, object, function)
782\end{verbatim}
783
784where \code{binary} is the pathname of the currently executing program
785(not just \code{argv[0]}!), \code{object} is the name of the \samp{.o}
786file to be dynamically loaded, and \code{function} is the name of a
787function in the module. If the dynamic loading succeeds,
788\code{dl_loadmod()} returns a pointer to the named function; if not, it
789returns \code{NULL}.
790
791I provide two implementations of \code{dl_loadmod()}: one for SGI machines
792running Irix 4.0 (written by my colleague Jack Jansen), and one that
793is a thin interface layer for Wilson Ho's (GNU) dynamic loading
794package \dfn{dld} (version 3.2.3). Dld implements a much more powerful
795version of dynamic loading than needed (including unlinking), but it
796does not support System V's COFF object file format. It currently
797supports only VAX (Ultrix), Sun 3 (SunOS 3.4 and 4.0), SPARCstation
798(SunOS 4.0), Sequent Symmetry (Dynix), and Atari ST (from the dld
7993.2.3 README file). Dld is part of the standard Python distribution;
800if you didn't get it,many ftp archive sites carry dld these days, so
801it won't be hard to get hold of it if you need it (using archie).
802
803(If you don't know where to get dld, try anonymous ftp to
804\file{wuarchive.wustl.edu:/mirrors2/gnu/dld-3.2.3.tar.Z}. Jack's dld
805can be found at \file{ftp.cwi.nl:/pub/python/dl.tar.Z}.)
806
807To build a Python interpreter capable of dynamic loading, you need to
808edit the Makefile. Basically you must uncomment the lines starting
809with \samp{\#DL_}, but you must also edit some of the lines to choose
810which version of dl_loadmod to use, and fill in the pathname of the dld
811library if you use it. And, of course, you must first build
812dl_loadmod and dld, if used. (This is now done through the Configure
813script. For SunOS, everything is now automatic as long as the
814architecture type is \code{sun4}.)
815
816
817\section{Building a dynamically loadable module}
818
819Building an object file usable by dynamic loading is easy, if you
820follow these rules (substitute your module name for \code{foo}
821everywhere):
822
823\begin{itemize}
824
825\item
826The source filename must be \file{foomodule.c}, so the object
827name is \file{foomodule.o}.
828
829\item
830The module must be written as a (statically linked) Python extension
831module (described in an earlier chapter) except that no line for it
832must be added to \file{config.c} and it mustn't be linked with the
833main Python interpreter.
834
835\item
836The module's initialization function must be called \code{initfoo}; it
837must install the module in \code{sys.modules} (generally by calling
838\code{initmodule()} as explained earlier.
839
840\item
841The module must be compiled with \samp{-c}. The resulting .o file must
842not be stripped.
843
844\item
845Since the module must include many standard Python include files, it
846must be compiled with a \samp{-I} option pointing to the Python source
847directory (unless it resides there itself).
848
849\item
850On SGI Irix, the compiler flag \samp{-G0} (or \samp{-G 0}) must be passed.
851IF THIS IS NOT DONE THE RESULTING CODE WILL NOT WORK.
852
853\item
854{\bf NEW:} On SunOS, you must create a shared library from your \samp{.o}
855file using the following command (assuming your module is called
856\code{foo}):
857
858\begin{verbatim}
859 ld -o foomodule.so foomodule.o <any other libraries needed>
860\end{verbatim}
861
862and place the resulting \samp{.so} file in the Python search path (not
863the \samp{.o} file). Note: on Solaris, you need to pass \samp{-G} to
864the loader.
865
866\end{itemize}
867
868
869\section{Using libraries}
870
871If your dynamically loadable module needs to be linked with one or
872more libraries that aren't linked with Python (or if it needs a
873routine that isn't used by Python from one of the libraries with which
874Python is linked), you must specify a list of libraries to search
875after loading the module in a file with extension \samp{.libs} (and
876otherwise the same as your \samp{.o} file). This file should contain
877one or more lines containing whitespace-separated absolute library
878pathnames. When using the dl interface, \samp{-l...} flags may also
879be used (it is in fact passed as an option list to the system linker
880ld(1)), but the dl-dld interface requires absolute pathnames. I
881believe it is possible to specify shared libraries here.
882
883(On SunOS, any extra libraries must be specified on the \code{ld}
884command that creates the \samp{.so} file.)
885
886
887\section{Caveats}
888
889Dynamic loading requires that \code{main}'s \code{argv[0]} contains
890the pathname or at least filename of the Python interpreter.
891Unfortunately, when executing a directly executable Python script (an
892executable file with \samp{\#!...} on the first line), the kernel
893overwrites \code{argv[0]} with the name of the script. There is no
894easy way around this, so executable Python scripts cannot use
895dynamically loaded modules. (You can always write a simple shell
896script that calls the Python interpreter with the script as its
897input.)
898
899When using dl, the overlay is first converted into an `overlay' for
900the current process by the system linker (\code{ld}). The overlay is
901saved as a file with extension \samp{.ld}, either in the directory
902where the \samp{.o} file lives or (if that can't be written) in a
903temporary directory. An existing \samp{.ld} file resulting from a
904previous run (not from a temporary directory) is used, bypassing the
905(costly) linking phase, provided its version matches the \samp{.o}
906file and the current binary. (See the \code{dl} man page for more
907details.)
908
909
Guido van Rossum7a2dba21993-11-05 14:45:11 +0000910\input{ext.ind}
911
912\end{document}