blob: 68aece0a6232f4ee2fbb9886c42f4d60c9f959d8 [file] [log] [blame]
Guido van Rossum7a2dba21993-11-05 14:45:11 +00001\documentstyle[twoside,11pt,myformat]{report}
2
3\title{\bf Extending and Embedding the Python Interpreter}
4
5\author{
6 Guido van Rossum \\
7 Dept. CST, CWI, Kruislaan 413 \\
8 1098 SJ Amsterdam, The Netherlands \\
9 E-mail: {\tt guido@cwi.nl}
10}
11
12% Tell \index to actually write the .idx file
13\makeindex
14
15\begin{document}
16
17\pagenumbering{roman}
18
19\maketitle
20
21\begin{abstract}
22
23\noindent
24This document describes how you can extend the Python interpreter with
25new modules written in C or C++. It also describes how to use the
26interpreter as a library package from applications using Python as an
27``embedded'' language.
28
29\end{abstract}
30
31\pagebreak
32
33{
34\parskip = 0mm
35\tableofcontents
36}
37
38\pagebreak
39
40\pagenumbering{arabic}
41
42\chapter{Extending Python with C or C++ code}
43
44It is quite easy to add non-standard built-in modules to Python, if
45you know how to program in C. A built-in module known to the Python
46programmer as foo is generally implemented in a file called
47foomodule.c. The standard built-in modules also adhere to this
48convention, and in fact some of them form excellent examples of how to
49create an extension.
50
51Extension modules can do two things that can't be done directly in
52Python: implement new data types and provide access to system calls or
53C library functions. Since the latter is usually the most important
54reason for adding an extension, I'll concentrate on adding "wrappers"
55around C library functions; the concrete example uses the wrapper for
56system() in module posix, found in (of course) the file posixmodule.c.
57
58It is important not to be impressed by the size and complexity of
59the average extension module; much of this is straightforward
60"boilerplate" code (starting right with the copyright notice!).
61
62Let's skip the boilerplate and jump right to an interesting function:
63
64\begin{verbatim}
65 static object *
66 posix_system(self, args)
67 object *self;
68 object *args;
69 {
70 char *command;
71 int sts;
72 if (!getargs(args, "s", &command))
73 return NULL;
74 sts = system(command);
75 return newintobject((long)sts);
76 }
77\end{verbatim}
78
79This is the prototypical top-level function in an extension module.
80It will be called (we'll see later how this is made possible) when the
81Python program executes statements like
82
83\begin{verbatim}
84 >>> import posix
85 >>> sts = posix.system('ls -l')
86\end{verbatim}
87
88There is a straightforward translation from the arguments to the call
89in Python (here the single value 'ls -l') to the arguments that are
90passed to the C function. The C function always has two parameters,
91conventionally named 'self' and 'args'. In this example, 'self' will
92always be a NULL pointer, since this is a function, not a method (this
93is done so that the interpreter doesn't have to understand two
94different types of C functions).
95
96The 'args' parameter will be a pointer to a Python object, or NULL if
97the Python function/method was called without arguments. It is
98necessary to do full argument type checking on each call, since
99otherwise the Python user could cause a core dump by passing the wrong
100arguments (or no arguments at all). Because argument checking and
101converting arguments to C is such a common task, there's a general
102function in the Python interpreter which combines these tasks:
103getargs(). It uses a template string to determine both the types of
104the Python argument and the types of the C variables into which it
105should store the converted values.
106
107When getargs returns nonzero, the argument list has the right type and
108its components have been stored in the variables whose addresses are
109passed. When it returns zero, an error has occurred. In the latter
110case it has already raised an appropriate exception by calling
111err_setstr(), so the calling function can just return NULL.
112
113The form of the format string is described at the end of this file.
114(There are convenience macros getstrarg(), getintarg(), etc., for many
115common forms of argument lists. These are relics from the past; it's
116better to call getargs() directly.)
117
118
119\section{Intermezzo: errors and exceptions}
120
121An important convention throughout the Python interpreter is the
122following: when a function fails, it should set an exception condition
123and return an error value (often a NULL pointer). Exceptions are set
124in a global variable in the file errors.c; if this variable is NULL no
125exception has occurred. A second variable is the "associated value"
126of the exception.
127
128The file errors.h declares a host of err_* functions to set various
129types of exceptions. The most common one is err_setstr() -- its
130arguments are an exception object (e.g. RuntimeError -- actually it
131can be any string object) and a C string indicating the cause of the
132error (this is converted to a string object and stored as the
133"associated value" of the exception). Another useful function is
134err_errno(), which only takes an exception argument and constructs the
135associated value by inspection of the (UNIX) global variable errno.
136
137You can test non-destructively whether an exception has been set with
138err_occurred(). However, most code never calls err_occurred() to see
139whether an error occurred or not, but relies on error return values
140from the functions it calls instead:
141
142When a function that calls another function detects that the called
143function fails, it should return an error value but not set an
144condition -- one is already set. The caller is then supposed to also
145return an error indication to *its* caller, again *without* calling
146err_setstr(), and so on -- the most detailed cause of the error was
147already reported by the function that detected it in the first place.
148Once the error has reached Python's interpreter main loop, this aborts
149the currently executing Python code and tries to find an exception
150handler specified by the Python programmer.
151
152To ignore an exception set by a function call that failed, the
153exception condition must be cleared explicitly by calling err_clear().
154The only time C code should call err_clear() is if it doesn't want to
155pass the error on to the interpreter but wants to handle it completely
156by itself (e.g. by trying something else or pretending nothing
157happened).
158
159Finally, the function err_get() gives you both error variables
160*and clears them*. Note that even if an error occurred the second one
161may be NULL. I doubt you will need to use this function.
162
163Note that a failing malloc() call must also be turned into an
164exception -- the direct caller of malloc() (or realloc()) must call
165err_nomem() and return a failure indicator itself. All the
166object-creating functions (newintobject() etc.) already do this, so
167only if you call malloc() directly this note is of importance.
168
169Also note that, with the important exception of getargs(), functions
170that return an integer status usually use 0 for success and -1 for
171failure.
172
173Finally, be careful about cleaning up garbage (making appropriate
174[X]DECREF() calls) when you return an error!
175
176
177\section{Back to the example}
178
179Going back to posix_system, you should now be able to understand this
180bit:
181
182\begin{verbatim}
183 if (!getargs(args, "s", &command))
184 return NULL;
185\end{verbatim}
186
187It returns NULL (the error indicator for functions of this kind) if an
188error is detected in the argument list, relying on the exception set
189by getargs(). The string value of the argument is now copied to the
190local variable 'command'.
191
192If a Python function is called with multiple arguments, the argument
193list is turned into a tuple. Python programs can us this feature, for
194instance, to explicitly create the tuple containing the arguments
195first and make the call later.
196
197The next statement in posix_system is a call tothe C library function
198system(), passing it the string we just got from getargs():
199
200\begin{verbatim}
201 sts = system(command);
202\end{verbatim}
203
204Python strings may contain internal null bytes; but if these occur in
205this example the rest of the string will be ignored by system().
206
207Finally, posix.system() must return a value: the integer status
208returned by the C library system() function. This is done by the
209function newintobject(), which takes a (long) integer as parameter.
210
211\begin{verbatim}
212 return newintobject((long)sts);
213\end{verbatim}
214
215(Yes, even integers are represented as objects on the heap in Python!)
216If you had a function that returned no useful argument, you would need
217this idiom:
218
219\begin{verbatim}
220 INCREF(None);
221 return None;
222\end{verbatim}
223
224'None' is a unique Python object representing 'no value'. It differs
225from NULL, which means 'error' in most contexts (except when passed as
226a function argument -- there it means 'no arguments').
227
228
229\section{The module's function table}
230
231I promised to show how I made the function posix_system() available to
232Python programs. This is shown later in posixmodule.c:
233
234\begin{verbatim}
235 static struct methodlist posix_methods[] = {
236 ...
237 {"system", posix_system},
238 ...
239 {NULL, NULL} /* Sentinel */
240 };
241
242 void
243 initposix()
244 {
245 (void) initmodule("posix", posix_methods);
246 }
247\end{verbatim}
248
249(The actual initposix() is somewhat more complicated, but most
250extension modules are indeed as simple as that.) When the Python
251program first imports module 'posix', initposix() is called, which
252calls initmodule() with specific parameters. This creates a module
253object (which is inserted in the table sys.modules under the key
254'posix'), and adds built-in-function objects to the newly created
255module based upon the table (of type struct methodlist) that was
256passed as its second parameter. The function initmodule() returns a
257pointer to the module object that it creates, but this is unused here.
258It aborts with a fatal error if the module could not be initialized
259satisfactorily.
260
261
262\section{Calling the module initialization function}
263
264There is one more thing to do: telling the Python module to call the
265initfoo() function when it encounters an 'import foo' statement.
266This is done in the file config.c. This file contains a table mapping
267module names to parameterless void function pointers. You need to add
268a declaration of initfoo() somewhere early in the file, and a line
269saying
270
271\begin{verbatim}
272 {"foo", initfoo},
273\end{verbatim}
274
275to the initializer for inittab[]. It is conventional to include both
276the declaration and the initializer line in preprocessor commands
277\verb\#ifdef USE_FOO\ / \verb\#endif\, to make it easy to turn the foo
278extension on or off. Note that the Macintosh version uses a different
279configuration file, distributed as configmac.c. This strategy may be
280extended to other operating system versions, although usually the
281standard config.c file gives a pretty useful starting point for a new
282config*.c file.
283
284And, of course, I forgot the Makefile. This is actually not too hard,
285just follow the examples for, say, AMOEBA. Just find all occurrences
286of the string AMOEBA in the Makefile and do the same for FOO that's
287done for AMOEBA...
288
289(Note: if you are using dynamic loading for your extension, you don't
290need to edit config.c and the Makefile. See "./DYNLOAD" for more info
291about this.)
292
293
294\section{Calling Python functions from C}
295
296The above concentrates on making C functions accessible to the Python
297programmer. The reverse is also often useful: calling Python
298functions from C. This is especially the case for libraries that
299support so-called "callback" functions. If a C interface makes heavy
300use of callbacks, the equivalent Python often needs to provide a
301callback mechanism to the Python programmer; the implementation may
302require calling the Python callback functions from a C callback.
303Other uses are also possible.
304
305Fortunately, the Python interpreter is easily called recursively, and
306there is a standard interface to call a Python function. I won't
307dwell on how to call the Python parser with a particular string as
308input -- if you're interested, have a look at the implementation of
309the "-c" command line option in pythonmain.c.
310
311Calling a Python function is easy. First, the Python program must
312somehow pass you the Python function object. You should provide a
313function (or some other interface) to do this. When this function is
314called, save a pointer to the Python function object (be careful to
315INCREF it!) in a global variable -- or whereever you see fit.
316For example, the following function might be part of a module
317definition:
318
319\begin{verbatim}
320 static object *my_callback;
321
322 static object *
323 my_set_callback(dummy, arg)
324 object *dummy, *arg;
325 {
326 XDECREF(my_callback); /* Dispose of previous callback */
327 my_callback = arg;
328 XINCREF(my_callback); /* Remember new callback */
329 /* Boilerplate for "void" return */
330 INCREF(None);
331 return None;
332 }
333\end{verbatim}
334
335Later, when it is time to call the function, you call the C function
336call_object(). This function has two arguments, both pointers to
337arbitrary Python objects: the Python function, and the argument. The
338argument can be NULL to call the function without arguments. For
339example:
340
341\begin{verbatim}
342 object *result;
343 ...
344 /* Time to call the callback */
345 result = call_object(my_callback, (object *)NULL);
346\end{verbatim}
347
348call_object() returns a Python object pointer: this is
349the return value of the Python function. call_object() is
350"reference-count-neutral" with respect to its arguments, but the
351return value is "new": either it is a brand new object, or it is an
352existing object whose reference count has been incremented. So, you
353should somehow apply DECREF to the result, even (especially!) if you
354are not interested in its value.
355
356Before you do this, however, it is important to check that the return
357value isn't NULL. If it is, the Python function terminated by raising
358an exception. If the C code that called call_object() is called from
359Python, it should now return an error indication to its Python caller,
360so the interpreter can print a stack trace, or the calling Python code
361can handle the exception. If this is not possible or desirable, the
362exception should be cleared by calling err_clear(). For example:
363
364\begin{verbatim}
365 if (result == NULL)
366 return NULL; /* Pass error back */
367 /* Here maybe use the result */
368 DECREF(result);
369\end{verbatim}
370
371Depending on the desired interface to the Python callback function,
372you may also have to provide an argument to call_object(). In some
373cases the argument is also provided by the Python program, through the
374same interface that specified the callback function. It can then be
375saved and used in the same manner as the function object. In other
376cases, you may have to construct a new object to pass as argument. In
377this case you must dispose of it as well. For example, if you want to
378pass an integral event code, you might use the following code:
379
380\begin{verbatim}
381 object *argument;
382 ...
383 argument = newintobject((long)eventcode);
384 result = call_object(my_callback, argument);
385 DECREF(argument);
386 if (result == NULL)
387 return NULL; /* Pass error back */
388 /* Here maybe use the result */
389 DECREF(result);
390\end{verbatim}
391
392Note the placement of DECREF(argument) immediately after the call,
393before the error check! Also note that strictly spoken this code is
394not complete: newintobject() may run out of memory, and this should be
395checked.
396
397In even more complicated cases you may want to pass the callback
398function multiple arguments. To this end you have to construct (and
399dispose of!) a tuple object. Details (mostly concerned with the
400errror checks and reference count manipulation) are left as an
401exercise for the reader; most of this is also needed when returning
402multiple values from a function.
403
404XXX TO DO: explain objects and reference counting.
405XXX TO DO: defining new object types.
406
407
408\section{Format strings for getargs()}
409
410The getargs() function is declared in "modsupport.h" as follows:
411
412\begin{verbatim}
413 int getargs(object *arg, char *format, ...);
414\end{verbatim}
415
416The remaining arguments must be addresses of variables whose type is
417determined by the format string. For the conversion to succeed, the
418`arg' object must match the format and the format must be exhausted.
419Note that while getargs() checks that the Python object really is of
420the specified type, it cannot check that the addresses provided in the
421call match: if you make mistakes there, your code will probably dump
422core.
423
424A format string consists of a single `format unit'. A format unit
425describes one Python object; it is usually a single character or a
426parenthesized string. The type of a format units is determined from
427its first character, the `format letter':
428
429's' (string)
430 The Python object must be a string object. The C argument
431 must be a char** (i.e., the address of a character pointer),
432 and a pointer to the C string contained in the Python object
433 is stored into it. If the next character in the format string
434 is \verb\'#'\, another C argument of type int* must be present, and
435 the length of the Python string (not counting the trailing
436 zero byte) is stored into it.
437
438'z' (string or zero, i.e., NULL)
439 Like 's', but the object may also be None. In this case the
440 string pointer is set to NULL and if a \verb\'#'\ is present the size
441 it set to 0.
442
443'b' (byte, i.e., char interpreted as tiny int)
444 The object must be a Python integer. The C argument must be a
445 char*.
446
447'h' (half, i.e., short)
448 The object must be a Python integer. The C argument must be a
449 short*.
450
451'i' (int)
452 The object must be a Python integer. The C argument must be
453 an int*.
454
455'l' (long)
456 The object must be a (plain!) Python integer. The C argument
457 must be a long*.
458
459'c' (char)
460 The Python object must be a string of length 1. The C
461 argument must be a char*. (Don't pass an int*!)
462
463'f' (float)
464 The object must be a Python int or float. The C argument must
465 be a float*.
466
467'd' (double)
468 The object must be a Python int or float. The C argument must
469 be a double*.
470
471'S' (string object)
472 The object must be a Python string. The C argument must be an
473 object** (i.e., the address of an object pointer). The C
474 program thus gets back the actual string object that was
475 passed, not just a pointer to its array of characters and its
476 size as for format character 's'.
477
478'O' (object)
479 The object can be any Python object, including None, but not
480 NULL. The C argument must be an object**. This can be used
481 if an argument list must contain objects of a type for which
482 no format letter exist: the caller must then check that it has
483 the right type.
484
485'(' (tuple)
486 The object must be a Python tuple. Following the '('
487 character in the format string must come a number of format
488 units describing the elements of the tuple, followed by a ')'
489 character. Tuple format units may be nested. (There are no
490 exceptions for empty and singleton tuples; "()" specifies an
491 empty tuple and "(i)" a singleton of one integer. Normally
492 you don't want to use the latter, since it is hard for the
493 user to specify.
494
495
496More format characters will probably be added as the need arises. It
497should be allowed to use Python long integers whereever integers are
498expected, and perform a range check. (A range check is in fact always
499necessary for the 'b', 'h' and 'i' format letters, but this is
500currently not implemented.)
501
502
503Some example calls:
504
505\begin{verbatim}
506 int ok;
507 int i, j;
508 long k, l;
509 char *s;
510 int size;
511
512 ok = getargs(args, "(lls)", &k, &l, &s); /* Two longs and a string */
513 /* Possible Python call: f(1, 2, 'three') */
514
515 ok = getargs(args, "s", &s); /* A string */
516 /* Possible Python call: f('whoops!') */
517
518 ok = getargs(args, ""); /* No arguments */
519 /* Python call: f() */
520
521 ok = getargs(args, "((ii)s#)", &i, &j, &s, &size);
522 /* A pair of ints and a string, whose size is also returned */
523 /* Possible Python call: f(1, 2, 'three') */
524
525 {
526 int left, top, right, bottom, h, v;
527 ok = getargs(args, "(((ii)(ii))(ii))",
528 &left, &top, &right, &bottom, &h, &v);
529 /* A rectangle and a point */
530 /* Possible Python call:
531 f( ((0, 0), (400, 300)), (10, 10)) */
532 }
533\end{verbatim}
534
535Note that a format string must consist of a single unit; strings like
536\verb\'is'\ and \verb\'(ii)s#'\ are not valid format strings. (But
537\verb\'s#'\ is.)
538
539
540The getargs() function does not support variable-length argument
541lists. In simple cases you can fake these by trying several calls to
542getargs() until one succeeds, but you must take care to call
543err_clear() before each retry. For example:
544
545\begin{verbatim}
546 static object *my_method(self, args) object *self, *args; {
547 int i, j, k;
548
549 if (getargs(args, "(ii)", &i, &j)) {
550 k = 0; /* Use default third argument */
551 }
552 else {
553 err_clear();
554 if (!getargs(args, "(iii)", &i, &j, &k))
555 return NULL;
556 }
557 /* ... use i, j and k here ... */
558 INCREF(None);
559 return None;
560 }
561\end{verbatim}
562
563(It is possible to think of an extension to the definition of format
564strings to accomodate this directly, e.g., placing a '|' in a tuple
565might specify that the remaining arguments are optional. getargs()
566should then return 1 + the number of variables stored into.)
567
568
569Advanced users note: If you set the `varargs' flag in the method list
570for a function, the argument will always be a tuple (the `raw argument
571list'). In this case you must enclose single and empty argument lists
572in parentheses, e.g., "(s)" and "()".
573
574
575\section{The mkvalue() function}
576
577This function is the counterpart to getargs(). It is declared in
578"modsupport.h" as follows:
579
580\begin{verbatim}
581 object *mkvalue(char *format, ...);
582\end{verbatim}
583
584It supports exactly the same format letters as getargs(), but the
585arguments (which are input to the function, not output) must not be
586pointers, just values. If a byte, short or float is passed to a
587varargs function, it is widened by the compiler to int or double, so
588'b' and 'h' are treated as 'i' and 'f' is treated as 'd'. 'S' is
589treated as 'O', 's' is treated as 'z'. \verb\'z#'\ and \verb\'s#'\
590are supported: a second argument specifies the length of the data
591(negative means use strlen()). 'S' and 'O' add a reference to their
592argument (so you should DECREF it if you've just created it and aren't
593going to use it again).
594
595If the argument for 'O' or 'S' is a NULL pointer, it is assumed that
596this was caused because the call producing the argument found an error
597and set an exception. Therefore, mkvalue() will return NULL but won't
598set an exception if one is already set. If no exception is set,
599SystemError is set.
600
601If there is an error in the format string, the SystemError exception
602is set, since it is the calling C code's fault, not that of the Python
603user who sees the exception.
604
605Example:
606
607\begin{verbatim}
608 return mkvalue("(ii)", 0, 0);
609\end{verbatim}
610
611returns a tuple containing two zeros. (Outer parentheses in the
612format string are actually superfluous, but you can use them for
613compatibility with getargs(), which requires them if more than one
614argument is expected.)
615
616\section{Reference counts}
617
618Here's a useful explanation of INCREF and DECREF by Sjoerd Mullender.
619
620Use XINCREF or XDECREF instead of INCREF/DECREF when the argument may
621be NULL.
622
623The basic idea is, if you create an extra reference to an object, you
624must INCREF it, if you throw away a reference to an object, you must
625DECREF it. Functions such as newstringobject, newsizedstringobject,
626newintobject, etc. create a reference to an object. If you want to
627throw away the object thus created, you must use DECREF.
628
629If you put an object into a tuple, list, or dictionary, the idea is
630that you usually don't want to keep a reference of your own around, so
631Python does not INCREF the elements. It does DECREF the old value.
632This means that if you put something into such an object using the
633functions Python provides for this, you must INCREF the object if you
634want to keep a separate reference to the object around. Also, if you
635replace an element, you should INCREF the old element first if you
636want to keep it. If you didn't INCREF it before you replaced it, you
637are not allowed to look at it anymore, since it may have been freed.
638
639Returning an object to Python (i.e., when your module function
640returns) creates a reference to an object, but it does not change the
641reference count. When your module does not keep another reference to
642the object, you should not INCREF or DECREF it. When you do keep a
643reference around, you should INCREF the object. Also, when you return
644a global object such as None, you should INCREF it.
645
646If you want to return a tuple, you should consider using mkvalue.
647Mkvalue creates a new tuple with a reference count of 1 which you can
648return. If any of the elements you put into the tuple are objects,
649they are INCREFfed by mkvalue. If you don't want to keep references
650to those elements around, you should DECREF them after having called
651mkvalue.
652
653Usually you don't have to worry about arguments. They are INCREFfed
654before your function is called and DECREFfed after your function
655returns. When you keep a reference to an argument, you should INCREF
656it and DECREF when you throw it away. Also, when you return an
657argument, you should INCREF it, because returning the argument creates
658an extra reference to it.
659
660If you use getargs() to parse the arguments, you can get a reference
661to an object (by using "O" in the format string). This object was not
662INCREFfed, so you should not DECREF it. If you want to keep the
663object, you must INCREF it yourself.
664
665If you create your own type of objects, you should use NEWOBJ to
666create the object. This sets the reference count to 1. If you want
667to throw away the object, you should use DECREF. When the reference
668count reaches 0, the dealloc function is called. In it, you should
669DECREF all object to which you keep references in your object, but you
670should not use DECREF on your object. You should use DEL instead.
671
672\chapter{Embedding Python in another application}
673
674Embedding Python is similar to extending it, but not quite. The
675difference is that when you extend Python, the main program of the
676application is still the Python interpreter, while of you embed
677Python, the main program may have nothing to do with Python --
678instead, some parts of the application occasionally call the Python
679interpreter to run some Python code.
680
681So if you are embedding Python, you are providing your own main
682program. One of the things this main program has to do is initialize
683the Python interpreter. At the very least, you have to call the
684function initall(). There are optional calls to pass command line
685arguments to Python. Then later you can call the interpreter from any
686part of the application.
687
688There are several different ways to call the interpreter: you can pass
689a string containing Python statements to run_command(), or you can
690pass a stdio file pointer and a file name (for identification in error
691messages only) to run_script(). You can also call the lower-level
692operations described (partly) in the file \verb\<pythonroot>/misc/EXTENDING\
693to construct and use Python objects.
694
695A simple demo of embedding Python can be found in the directory
696\verb\<pythonroot>/embed/\.
697
698\section{Using C++}
699
700It is also possible to embed Python in a C++ program; how this is done
701exactly will depend on the details of the C++ system used; in general
702you will need to write the main program in C++, enclosing the include
703files in \verb\"extern "C" { ... }"\, and compile and link this with
704the C++ compiler. (There is no need to recompile Python itself with
705C++.)
706
707\input{ext.ind}
708
709\end{document}