blob: ee1b6781437c78f9ef157b207752fff170ab837b [file] [log] [blame]
Fred Drakecc8f44b2001-08-20 19:30:29 +00001\chapter{Extending Python with C or \Cpp{} \label{intro}}
2
3
4It is quite easy to add new built-in modules to Python, if you know
5how to program in C. Such \dfn{extension modules} can do two things
6that can't be done directly in Python: they can implement new built-in
7object types, and they can call C library functions and system calls.
8
9To support extensions, the Python API (Application Programmers
10Interface) defines a set of functions, macros and variables that
11provide access to most aspects of the Python run-time system. The
12Python API is incorporated in a C source file by including the header
13\code{"Python.h"}.
14
15The compilation of an extension module depends on its intended use as
16well as on your system setup; details are given in later chapters.
17
18
19\section{A Simple Example
20 \label{simpleExample}}
21
22Let's create an extension module called \samp{spam} (the favorite food
23of Monty Python fans...) and let's say we want to create a Python
24interface to the C library function \cfunction{system()}.\footnote{An
25interface for this function already exists in the standard module
26\module{os} --- it was chosen as a simple and straightfoward example.}
27This function takes a null-terminated character string as argument and
28returns an integer. We want this function to be callable from Python
29as follows:
30
31\begin{verbatim}
32>>> import spam
33>>> status = spam.system("ls -l")
34\end{verbatim}
35
36Begin by creating a file \file{spammodule.c}. (Historically, if a
37module is called \samp{spam}, the C file containing its implementation
38is called \file{spammodule.c}; if the module name is very long, like
39\samp{spammify}, the module name can be just \file{spammify.c}.)
40
41The first line of our file can be:
42
43\begin{verbatim}
44#include <Python.h>
45\end{verbatim}
46
47which pulls in the Python API (you can add a comment describing the
48purpose of the module and a copyright notice if you like).
49
50All user-visible symbols defined by \code{"Python.h"} have a prefix of
51\samp{Py} or \samp{PY}, except those defined in standard header files.
52For convenience, and since they are used extensively by the Python
53interpreter, \code{"Python.h"} includes a few standard header files:
54\code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>}, and
55\code{<stdlib.h>}. If the latter header file does not exist on your
56system, it declares the functions \cfunction{malloc()},
57\cfunction{free()} and \cfunction{realloc()} directly.
58
59The next thing we add to our module file is the C function that will
60be called when the Python expression \samp{spam.system(\var{string})}
61is evaluated (we'll see shortly how it ends up being called):
62
63\begin{verbatim}
64static PyObject *
65spam_system(self, args)
66 PyObject *self;
67 PyObject *args;
68{
69 char *command;
70 int sts;
71
72 if (!PyArg_ParseTuple(args, "s", &command))
73 return NULL;
74 sts = system(command);
75 return Py_BuildValue("i", sts);
76}
77\end{verbatim}
78
79There is a straightforward translation from the argument list in
80Python (for example, the single expression \code{"ls -l"}) to the
81arguments passed to the C function. The C function always has two
82arguments, conventionally named \var{self} and \var{args}.
83
84The \var{self} argument is only used when the C function implements a
85built-in method, not a function. In the example, \var{self} will
86always be a \NULL{} pointer, since we are defining a function, not a
87method. (This is done so that the interpreter doesn't have to
88understand two different types of C functions.)
89
90The \var{args} argument will be a pointer to a Python tuple object
91containing the arguments. Each item of the tuple corresponds to an
92argument in the call's argument list. The arguments are Python
93objects --- in order to do anything with them in our C function we have
94to convert them to C values. The function \cfunction{PyArg_ParseTuple()}
95in the Python API checks the argument types and converts them to C
96values. It uses a template string to determine the required types of
97the arguments as well as the types of the C variables into which to
98store the converted values. More about this later.
99
100\cfunction{PyArg_ParseTuple()} returns true (nonzero) if all arguments have
101the right type and its components have been stored in the variables
102whose addresses are passed. It returns false (zero) if an invalid
103argument list was passed. In the latter case it also raises an
104appropriate exception so the calling function can return
105\NULL{} immediately (as we saw in the example).
106
107
108\section{Intermezzo: Errors and Exceptions
109 \label{errors}}
110
111An important convention throughout the Python interpreter is the
112following: when a function fails, it should set an exception condition
113and return an error value (usually a \NULL{} pointer). Exceptions
114are stored in a static global variable inside the interpreter; if this
115variable is \NULL{} no exception has occurred. A second global
116variable stores the ``associated value'' of the exception (the second
117argument to \keyword{raise}). A third variable contains the stack
118traceback in case the error originated in Python code. These three
119variables are the C equivalents of the Python variables
120\code{sys.exc_type}, \code{sys.exc_value} and \code{sys.exc_traceback} (see
121the section on module \module{sys} in the
122\citetitle[../lib/lib.html]{Python Library Reference}). It is
123important to know about them to understand how errors are passed
124around.
125
126The Python API defines a number of functions to set various types of
127exceptions.
128
129The most common one is \cfunction{PyErr_SetString()}. Its arguments
130are an exception object and a C string. The exception object is
131usually a predefined object like \cdata{PyExc_ZeroDivisionError}. The
132C string indicates the cause of the error and is converted to a
133Python string object and stored as the ``associated value'' of the
134exception.
135
136Another useful function is \cfunction{PyErr_SetFromErrno()}, which only
137takes an exception argument and constructs the associated value by
138inspection of the global variable \cdata{errno}. The most
139general function is \cfunction{PyErr_SetObject()}, which takes two object
140arguments, the exception and its associated value. You don't need to
141\cfunction{Py_INCREF()} the objects passed to any of these functions.
142
143You can test non-destructively whether an exception has been set with
144\cfunction{PyErr_Occurred()}. This returns the current exception object,
145or \NULL{} if no exception has occurred. You normally don't need
146to call \cfunction{PyErr_Occurred()} to see whether an error occurred in a
147function call, since you should be able to tell from the return value.
148
149When a function \var{f} that calls another function \var{g} detects
150that the latter fails, \var{f} should itself return an error value
151(usually \NULL{} or \code{-1}). It should \emph{not} call one of the
152\cfunction{PyErr_*()} functions --- one has already been called by \var{g}.
153\var{f}'s caller is then supposed to also return an error indication
154to \emph{its} caller, again \emph{without} calling \cfunction{PyErr_*()},
155and so on --- the most detailed cause of the error was already
156reported by the function that first detected it. Once the error
157reaches the Python interpreter's main loop, this aborts the currently
158executing Python code and tries to find an exception handler specified
159by the Python programmer.
160
161(There are situations where a module can actually give a more detailed
162error message by calling another \cfunction{PyErr_*()} function, and in
163such cases it is fine to do so. As a general rule, however, this is
164not necessary, and can cause information about the cause of the error
165to be lost: most operations can fail for a variety of reasons.)
166
167To ignore an exception set by a function call that failed, the exception
168condition must be cleared explicitly by calling \cfunction{PyErr_Clear()}.
169The only time C code should call \cfunction{PyErr_Clear()} is if it doesn't
170want to pass the error on to the interpreter but wants to handle it
171completely by itself (possibly by trying something else, or pretending
172nothing went wrong).
173
174Every failing \cfunction{malloc()} call must be turned into an
175exception --- the direct caller of \cfunction{malloc()} (or
176\cfunction{realloc()}) must call \cfunction{PyErr_NoMemory()} and
177return a failure indicator itself. All the object-creating functions
178(for example, \cfunction{PyInt_FromLong()}) already do this, so this
179note is only relevant to those who call \cfunction{malloc()} directly.
180
181Also note that, with the important exception of
182\cfunction{PyArg_ParseTuple()} and friends, functions that return an
183integer status usually return a positive value or zero for success and
184\code{-1} for failure, like \UNIX{} system calls.
185
186Finally, be careful to clean up garbage (by making
187\cfunction{Py_XDECREF()} or \cfunction{Py_DECREF()} calls for objects
188you have already created) when you return an error indicator!
189
190The choice of which exception to raise is entirely yours. There are
191predeclared C objects corresponding to all built-in Python exceptions,
192such as \cdata{PyExc_ZeroDivisionError}, which you can use directly.
193Of course, you should choose exceptions wisely --- don't use
194\cdata{PyExc_TypeError} to mean that a file couldn't be opened (that
195should probably be \cdata{PyExc_IOError}). If something's wrong with
196the argument list, the \cfunction{PyArg_ParseTuple()} function usually
197raises \cdata{PyExc_TypeError}. If you have an argument whose value
198must be in a particular range or must satisfy other conditions,
199\cdata{PyExc_ValueError} is appropriate.
200
201You can also define a new exception that is unique to your module.
202For this, you usually declare a static object variable at the
203beginning of your file:
204
205\begin{verbatim}
206static PyObject *SpamError;
207\end{verbatim}
208
209and initialize it in your module's initialization function
210(\cfunction{initspam()}) with an exception object (leaving out
211the error checking for now):
212
213\begin{verbatim}
214void
215initspam()
216{
217 PyObject *m, *d;
218
219 m = Py_InitModule("spam", SpamMethods);
220 d = PyModule_GetDict(m);
221 SpamError = PyErr_NewException("spam.error", NULL, NULL);
222 PyDict_SetItemString(d, "error", SpamError);
223}
224\end{verbatim}
225
226Note that the Python name for the exception object is
227\exception{spam.error}. The \cfunction{PyErr_NewException()} function
228may create a class with the base class being \exception{Exception}
229(unless another class is passed in instead of \NULL), described in the
230\citetitle[../lib/lib.html]{Python Library Reference} under ``Built-in
231Exceptions.''
232
233Note also that the \cdata{SpamError} variable retains a reference to
234the newly created exception class; this is intentional! Since the
235exception could be removed from the module by external code, an owned
236reference to the class is needed to ensure that it will not be
237discarded, causing \cdata{SpamError} to become a dangling pointer.
238Should it become a dangling pointer, C code which raises the exception
239could cause a core dump or other unintended side effects.
240
241
242\section{Back to the Example
243 \label{backToExample}}
244
245Going back to our example function, you should now be able to
246understand this statement:
247
248\begin{verbatim}
249 if (!PyArg_ParseTuple(args, "s", &command))
250 return NULL;
251\end{verbatim}
252
253It returns \NULL{} (the error indicator for functions returning
254object pointers) if an error is detected in the argument list, relying
255on the exception set by \cfunction{PyArg_ParseTuple()}. Otherwise the
256string value of the argument has been copied to the local variable
257\cdata{command}. This is a pointer assignment and you are not supposed
258to modify the string to which it points (so in Standard C, the variable
259\cdata{command} should properly be declared as \samp{const char
260*command}).
261
262The next statement is a call to the \UNIX{} function
263\cfunction{system()}, passing it the string we just got from
264\cfunction{PyArg_ParseTuple()}:
265
266\begin{verbatim}
267 sts = system(command);
268\end{verbatim}
269
270Our \function{spam.system()} function must return the value of
271\cdata{sts} as a Python object. This is done using the function
272\cfunction{Py_BuildValue()}, which is something like the inverse of
273\cfunction{PyArg_ParseTuple()}: it takes a format string and an
274arbitrary number of C values, and returns a new Python object.
275More info on \cfunction{Py_BuildValue()} is given later.
276
277\begin{verbatim}
278 return Py_BuildValue("i", sts);
279\end{verbatim}
280
281In this case, it will return an integer object. (Yes, even integers
282are objects on the heap in Python!)
283
284If you have a C function that returns no useful argument (a function
285returning \ctype{void}), the corresponding Python function must return
286\code{None}. You need this idiom to do so:
287
288\begin{verbatim}
289 Py_INCREF(Py_None);
290 return Py_None;
291\end{verbatim}
292
293\cdata{Py_None} is the C name for the special Python object
294\code{None}. It is a genuine Python object rather than a \NULL{}
295pointer, which means ``error'' in most contexts, as we have seen.
296
297
298\section{The Module's Method Table and Initialization Function
299 \label{methodTable}}
300
301I promised to show how \cfunction{spam_system()} is called from Python
302programs. First, we need to list its name and address in a ``method
303table'':
304
305\begin{verbatim}
306static PyMethodDef SpamMethods[] = {
307 ...
308 {"system", spam_system, METH_VARARGS},
309 ...
310 {NULL, NULL} /* Sentinel */
311};
312\end{verbatim}
313
314Note the third entry (\samp{METH_VARARGS}). This is a flag telling
315the interpreter the calling convention to be used for the C
316function. It should normally always be \samp{METH_VARARGS} or
317\samp{METH_VARARGS | METH_KEYWORDS}; a value of \code{0} means that an
318obsolete variant of \cfunction{PyArg_ParseTuple()} is used.
319
320When using only \samp{METH_VARARGS}, the function should expect
321the Python-level parameters to be passed in as a tuple acceptable for
322parsing via \cfunction{PyArg_ParseTuple()}; more information on this
323function is provided below.
324
325The \constant{METH_KEYWORDS} bit may be set in the third field if
326keyword arguments should be passed to the function. In this case, the
327C function should accept a third \samp{PyObject *} parameter which
328will be a dictionary of keywords. Use
329\cfunction{PyArg_ParseTupleAndKeywords()} to parse the arguments to
330such a function.
331
332The method table must be passed to the interpreter in the module's
333initialization function. The initialization function must be named
334\cfunction{init\var{name}()}, where \var{name} is the name of the
335module, and should be the only non-\keyword{static} item defined in
336the module file:
337
338\begin{verbatim}
339void
340initspam()
341{
342 (void) Py_InitModule("spam", SpamMethods);
343}
344\end{verbatim}
345
346Note that for \Cpp, this method must be declared \code{extern "C"}.
347
348When the Python program imports module \module{spam} for the first
349time, \cfunction{initspam()} is called. (See below for comments about
350embedding Python.) It calls
351\cfunction{Py_InitModule()}, which creates a ``module object'' (which
352is inserted in the dictionary \code{sys.modules} under the key
353\code{"spam"}), and inserts built-in function objects into the newly
354created module based upon the table (an array of \ctype{PyMethodDef}
355structures) that was passed as its second argument.
356\cfunction{Py_InitModule()} returns a pointer to the module object
357that it creates (which is unused here). It aborts with a fatal error
358if the module could not be initialized satisfactorily, so the caller
359doesn't need to check for errors.
360
361When embedding Python, the \cfunction{initspam()} function is not
362called automatically unless there's an entry in the
363\cdata{_PyImport_Inittab} table. The easiest way to handle this is to
364statically initialize your statically-linked modules by directly
365calling \cfunction{initspam()} after the call to
366\cfunction{Py_Initialize()} or \cfunction{PyMac_Initialize()}:
367
368\begin{verbatim}
369int main(int argc, char **argv)
370{
371 /* Pass argv[0] to the Python interpreter */
372 Py_SetProgramName(argv[0]);
373
374 /* Initialize the Python interpreter. Required. */
375 Py_Initialize();
376
377 /* Add a static module */
378 initspam();
379\end{verbatim}
380
381An example may be found in the file \file{Demo/embed/demo.c} in the
382Python source distribution.
383
384\strong{Note:} Removing entries from \code{sys.modules} or importing
385compiled modules into multiple interpreters within a process (or
386following a \cfunction{fork()} without an intervening
387\cfunction{exec()}) can create problems for some extension modules.
388Extension module authors should exercise caution when initializing
389internal data structures.
390Note also that the \function{reload()} function can be used with
391extension modules, and will call the module initialization function
392(\cfunction{initspam()} in the example), but will not load the module
393again if it was loaded from a dynamically loadable object file
394(\file{.so} on \UNIX, \file{.dll} on Windows).
395
396A more substantial example module is included in the Python source
397distribution as \file{Modules/xxmodule.c}. This file may be used as a
398template or simply read as an example. The \program{modulator.py}
399script included in the source distribution or Windows install provides
400a simple graphical user interface for declaring the functions and
401objects which a module should implement, and can generate a template
402which can be filled in. The script lives in the
403\file{Tools/modulator/} directory; see the \file{README} file there
404for more information.
405
406
407\section{Compilation and Linkage
408 \label{compilation}}
409
410There are two more things to do before you can use your new extension:
411compiling and linking it with the Python system. If you use dynamic
412loading, the details depend on the style of dynamic loading your
413system uses; see the chapters about building extension modules on
414\UNIX{} (chapter \ref{building-on-unix}) and Windows (chapter
415\ref{building-on-windows}) for more information about this.
416% XXX Add information about MacOS
417
418If you can't use dynamic loading, or if you want to make your module a
419permanent part of the Python interpreter, you will have to change the
420configuration setup and rebuild the interpreter. Luckily, this is
421very simple: just place your file (\file{spammodule.c} for example) in
422the \file{Modules/} directory of an unpacked source distribution, add
423a line to the file \file{Modules/Setup.local} describing your file:
424
425\begin{verbatim}
426spam spammodule.o
427\end{verbatim}
428
429and rebuild the interpreter by running \program{make} in the toplevel
430directory. You can also run \program{make} in the \file{Modules/}
431subdirectory, but then you must first rebuild \file{Makefile}
432there by running `\program{make} Makefile'. (This is necessary each
433time you change the \file{Setup} file.)
434
435If your module requires additional libraries to link with, these can
436be listed on the line in the configuration file as well, for instance:
437
438\begin{verbatim}
439spam spammodule.o -lX11
440\end{verbatim}
441
442\section{Calling Python Functions from C
443 \label{callingPython}}
444
445So far we have concentrated on making C functions callable from
446Python. The reverse is also useful: calling Python functions from C.
447This is especially the case for libraries that support so-called
448``callback'' functions. If a C interface makes use of callbacks, the
449equivalent Python often needs to provide a callback mechanism to the
450Python programmer; the implementation will require calling the Python
451callback functions from a C callback. Other uses are also imaginable.
452
453Fortunately, the Python interpreter is easily called recursively, and
454there is a standard interface to call a Python function. (I won't
455dwell on how to call the Python parser with a particular string as
456input --- if you're interested, have a look at the implementation of
457the \programopt{-c} command line option in \file{Python/pythonmain.c}
458from the Python source code.)
459
460Calling a Python function is easy. First, the Python program must
461somehow pass you the Python function object. You should provide a
462function (or some other interface) to do this. When this function is
463called, save a pointer to the Python function object (be careful to
464\cfunction{Py_INCREF()} it!) in a global variable --- or wherever you
465see fit. For example, the following function might be part of a module
466definition:
467
468\begin{verbatim}
469static PyObject *my_callback = NULL;
470
471static PyObject *
472my_set_callback(dummy, args)
473 PyObject *dummy, *args;
474{
475 PyObject *result = NULL;
476 PyObject *temp;
477
478 if (PyArg_ParseTuple(args, "O:set_callback", &temp)) {
479 if (!PyCallable_Check(temp)) {
480 PyErr_SetString(PyExc_TypeError, "parameter must be callable");
481 return NULL;
482 }
483 Py_XINCREF(temp); /* Add a reference to new callback */
484 Py_XDECREF(my_callback); /* Dispose of previous callback */
485 my_callback = temp; /* Remember new callback */
486 /* Boilerplate to return "None" */
487 Py_INCREF(Py_None);
488 result = Py_None;
489 }
490 return result;
491}
492\end{verbatim}
493
494This function must be registered with the interpreter using the
495\constant{METH_VARARGS} flag; this is described in section
496\ref{methodTable}, ``The Module's Method Table and Initialization
497Function.'' The \cfunction{PyArg_ParseTuple()} function and its
498arguments are documented in section \ref{parseTuple}, ``Extracting
499Parameters in Extension Functions.''
500
501The macros \cfunction{Py_XINCREF()} and \cfunction{Py_XDECREF()}
502increment/decrement the reference count of an object and are safe in
503the presence of \NULL{} pointers (but note that \var{temp} will not be
504\NULL{} in this context). More info on them in section
505\ref{refcounts}, ``Reference Counts.''
506
507Later, when it is time to call the function, you call the C function
508\cfunction{PyEval_CallObject()}. This function has two arguments, both
509pointers to arbitrary Python objects: the Python function, and the
510argument list. The argument list must always be a tuple object, whose
511length is the number of arguments. To call the Python function with
512no arguments, pass an empty tuple; to call it with one argument, pass
513a singleton tuple. \cfunction{Py_BuildValue()} returns a tuple when its
514format string consists of zero or more format codes between
515parentheses. For example:
516
517\begin{verbatim}
518 int arg;
519 PyObject *arglist;
520 PyObject *result;
521 ...
522 arg = 123;
523 ...
524 /* Time to call the callback */
525 arglist = Py_BuildValue("(i)", arg);
526 result = PyEval_CallObject(my_callback, arglist);
527 Py_DECREF(arglist);
528\end{verbatim}
529
530\cfunction{PyEval_CallObject()} returns a Python object pointer: this is
531the return value of the Python function. \cfunction{PyEval_CallObject()} is
532``reference-count-neutral'' with respect to its arguments. In the
533example a new tuple was created to serve as the argument list, which
534is \cfunction{Py_DECREF()}-ed immediately after the call.
535
536The return value of \cfunction{PyEval_CallObject()} is ``new'': either it
537is a brand new object, or it is an existing object whose reference
538count has been incremented. So, unless you want to save it in a
539global variable, you should somehow \cfunction{Py_DECREF()} the result,
540even (especially!) if you are not interested in its value.
541
542Before you do this, however, it is important to check that the return
543value isn't \NULL{}. If it is, the Python function terminated by
544raising an exception. If the C code that called
545\cfunction{PyEval_CallObject()} is called from Python, it should now
546return an error indication to its Python caller, so the interpreter
547can print a stack trace, or the calling Python code can handle the
548exception. If this is not possible or desirable, the exception should
549be cleared by calling \cfunction{PyErr_Clear()}. For example:
550
551\begin{verbatim}
552 if (result == NULL)
553 return NULL; /* Pass error back */
554 ...use result...
555 Py_DECREF(result);
556\end{verbatim}
557
558Depending on the desired interface to the Python callback function,
559you may also have to provide an argument list to
560\cfunction{PyEval_CallObject()}. In some cases the argument list is
561also provided by the Python program, through the same interface that
562specified the callback function. It can then be saved and used in the
563same manner as the function object. In other cases, you may have to
564construct a new tuple to pass as the argument list. The simplest way
565to do this is to call \cfunction{Py_BuildValue()}. For example, if
566you want to pass an integral event code, you might use the following
567code:
568
569\begin{verbatim}
570 PyObject *arglist;
571 ...
572 arglist = Py_BuildValue("(l)", eventcode);
573 result = PyEval_CallObject(my_callback, arglist);
574 Py_DECREF(arglist);
575 if (result == NULL)
576 return NULL; /* Pass error back */
577 /* Here maybe use the result */
578 Py_DECREF(result);
579\end{verbatim}
580
581Note the placement of \samp{Py_DECREF(arglist)} immediately after the
582call, before the error check! Also note that strictly spoken this
583code is not complete: \cfunction{Py_BuildValue()} may run out of
584memory, and this should be checked.
585
586
587\section{Extracting Parameters in Extension Functions
588 \label{parseTuple}}
589
590The \cfunction{PyArg_ParseTuple()} function is declared as follows:
591
592\begin{verbatim}
593int PyArg_ParseTuple(PyObject *arg, char *format, ...);
594\end{verbatim}
595
596The \var{arg} argument must be a tuple object containing an argument
597list passed from Python to a C function. The \var{format} argument
598must be a format string, whose syntax is explained below. The
599remaining arguments must be addresses of variables whose type is
600determined by the format string. For the conversion to succeed, the
601\var{arg} object must match the format and the format must be
602exhausted. On success, \cfunction{PyArg_ParseTuple()} returns true,
603otherwise it returns false and raises an appropriate exception.
604
605Note that while \cfunction{PyArg_ParseTuple()} checks that the Python
606arguments have the required types, it cannot check the validity of the
607addresses of C variables passed to the call: if you make mistakes
608there, your code will probably crash or at least overwrite random bits
609in memory. So be careful!
610
611A format string consists of zero or more ``format units''. A format
612unit describes one Python object; it is usually a single character or
613a parenthesized sequence of format units. With a few exceptions, a
614format unit that is not a parenthesized sequence normally corresponds
615to a single address argument to \cfunction{PyArg_ParseTuple()}. In the
616following description, the quoted form is the format unit; the entry
617in (round) parentheses is the Python object type that matches the
618format unit; and the entry in [square] brackets is the type of the C
619variable(s) whose address should be passed. (Use the \samp{\&}
620operator to pass a variable's address.)
621
622Note that any Python object references which are provided to the
623caller are \emph{borrowed} references; do not decrement their
624reference count!
625
626\begin{description}
627
628\item[\samp{s} (string or Unicode object) {[char *]}]
629Convert a Python string or Unicode object to a C pointer to a
630character string. You must not provide storage for the string
631itself; a pointer to an existing string is stored into the character
632pointer variable whose address you pass. The C string is
633null-terminated. The Python string must not contain embedded null
634bytes; if it does, a \exception{TypeError} exception is raised.
635Unicode objects are converted to C strings using the default
636encoding. If this conversion fails, an \exception{UnicodeError} is
637raised.
638
639\item[\samp{s\#} (string, Unicode or any read buffer compatible object)
640{[char *, int]}]
641This variant on \samp{s} stores into two C variables, the first one a
642pointer to a character string, the second one its length. In this
643case the Python string may contain embedded null bytes. Unicode
644objects pass back a pointer to the default encoded string version of the
645object if such a conversion is possible. All other read buffer
646compatible objects pass back a reference to the raw internal data
647representation.
648
649\item[\samp{z} (string or \code{None}) {[char *]}]
650Like \samp{s}, but the Python object may also be \code{None}, in which
651case the C pointer is set to \NULL{}.
652
653\item[\samp{z\#} (string or \code{None} or any read buffer compatible object)
654{[char *, int]}]
655This is to \samp{s\#} as \samp{z} is to \samp{s}.
656
657\item[\samp{u} (Unicode object) {[Py_UNICODE *]}]
658Convert a Python Unicode object to a C pointer to a null-terminated
659buffer of 16-bit Unicode (UTF-16) data. As with \samp{s}, there is no need
660to provide storage for the Unicode data buffer; a pointer to the
661existing Unicode data is stored into the Py_UNICODE pointer variable whose
662address you pass.
663
664\item[\samp{u\#} (Unicode object) {[Py_UNICODE *, int]}]
665This variant on \samp{u} stores into two C variables, the first one
666a pointer to a Unicode data buffer, the second one its length.
667
668\item[\samp{es} (string, Unicode object or character buffer compatible
669object) {[const char *encoding, char **buffer]}]
670This variant on \samp{s} is used for encoding Unicode and objects
671convertible to Unicode into a character buffer. It only works for
672encoded data without embedded \NULL{} bytes.
673
674The variant reads one C variable and stores into two C variables, the
675first one a pointer to an encoding name string (\var{encoding}), and the
676second a pointer to a pointer to a character buffer (\var{**buffer},
677the buffer used for storing the encoded data).
678
679The encoding name must map to a registered codec. If set to \NULL{},
680the default encoding is used.
681
682\cfunction{PyArg_ParseTuple()} will allocate a buffer of the needed
683size using \cfunction{PyMem_NEW()}, copy the encoded data into this
684buffer and adjust \var{*buffer} to reference the newly allocated
685storage. The caller is responsible for calling
686\cfunction{PyMem_Free()} to free the allocated buffer after usage.
687
688\item[\samp{et} (string, Unicode object or character buffer compatible
689object) {[const char *encoding, char **buffer]}]
690Same as \samp{es} except that string objects are passed through without
691recoding them. Instead, the implementation assumes that the string
692object uses the encoding passed in as parameter.
693
694\item[\samp{es\#} (string, Unicode object or character buffer compatible
695object) {[const char *encoding, char **buffer, int *buffer_length]}]
696This variant on \samp{s\#} is used for encoding Unicode and objects
697convertible to Unicode into a character buffer. It reads one C
698variable and stores into three C variables, the first one a pointer to
699an encoding name string (\var{encoding}), the second a pointer to a
700pointer to a character buffer (\var{**buffer}, the buffer used for
701storing the encoded data) and the third one a pointer to an integer
702(\var{*buffer_length}, the buffer length).
703
704The encoding name must map to a registered codec. If set to \NULL{},
705the default encoding is used.
706
707There are two modes of operation:
708
709If \var{*buffer} points a \NULL{} pointer,
710\cfunction{PyArg_ParseTuple()} will allocate a buffer of the needed
711size using \cfunction{PyMem_NEW()}, copy the encoded data into this
712buffer and adjust \var{*buffer} to reference the newly allocated
713storage. The caller is responsible for calling
714\cfunction{PyMem_Free()} to free the allocated buffer after usage.
715
716If \var{*buffer} points to a non-\NULL{} pointer (an already allocated
717buffer), \cfunction{PyArg_ParseTuple()} will use this location as
718buffer and interpret \var{*buffer_length} as buffer size. It will then
719copy the encoded data into the buffer and 0-terminate it. Buffer
720overflow is signalled with an exception.
721
722In both cases, \var{*buffer_length} is set to the length of the
723encoded data without the trailing 0-byte.
724
725\item[\samp{et\#} (string, Unicode object or character buffer compatible
726object) {[const char *encoding, char **buffer]}]
727Same as \samp{es\#} except that string objects are passed through without
728recoding them. Instead, the implementation assumes that the string
729object uses the encoding passed in as parameter.
730
731\item[\samp{b} (integer) {[char]}]
732Convert a Python integer to a tiny int, stored in a C \ctype{char}.
733
734\item[\samp{h} (integer) {[short int]}]
735Convert a Python integer to a C \ctype{short int}.
736
737\item[\samp{i} (integer) {[int]}]
738Convert a Python integer to a plain C \ctype{int}.
739
740\item[\samp{l} (integer) {[long int]}]
741Convert a Python integer to a C \ctype{long int}.
742
743\item[\samp{c} (string of length 1) {[char]}]
744Convert a Python character, represented as a string of length 1, to a
745C \ctype{char}.
746
747\item[\samp{f} (float) {[float]}]
748Convert a Python floating point number to a C \ctype{float}.
749
750\item[\samp{d} (float) {[double]}]
751Convert a Python floating point number to a C \ctype{double}.
752
753\item[\samp{D} (complex) {[Py_complex]}]
754Convert a Python complex number to a C \ctype{Py_complex} structure.
755
756\item[\samp{O} (object) {[PyObject *]}]
757Store a Python object (without any conversion) in a C object pointer.
758The C program thus receives the actual object that was passed. The
759object's reference count is not increased. The pointer stored is not
760\NULL{}.
761
762\item[\samp{O!} (object) {[\var{typeobject}, PyObject *]}]
763Store a Python object in a C object pointer. This is similar to
764\samp{O}, but takes two C arguments: the first is the address of a
765Python type object, the second is the address of the C variable (of
766type \ctype{PyObject *}) into which the object pointer is stored.
767If the Python object does not have the required type,
768\exception{TypeError} is raised.
769
770\item[\samp{O\&} (object) {[\var{converter}, \var{anything}]}]
771Convert a Python object to a C variable through a \var{converter}
772function. This takes two arguments: the first is a function, the
773second is the address of a C variable (of arbitrary type), converted
774to \ctype{void *}. The \var{converter} function in turn is called as
775follows:
776
777\var{status}\code{ = }\var{converter}\code{(}\var{object}, \var{address}\code{);}
778
779where \var{object} is the Python object to be converted and
780\var{address} is the \ctype{void *} argument that was passed to
781\cfunction{PyArg_ConvertTuple()}. The returned \var{status} should be
782\code{1} for a successful conversion and \code{0} if the conversion
783has failed. When the conversion fails, the \var{converter} function
784should raise an exception.
785
786\item[\samp{S} (string) {[PyStringObject *]}]
787Like \samp{O} but requires that the Python object is a string object.
788Raises \exception{TypeError} if the object is not a string object.
789The C variable may also be declared as \ctype{PyObject *}.
790
791\item[\samp{U} (Unicode string) {[PyUnicodeObject *]}]
792Like \samp{O} but requires that the Python object is a Unicode object.
793Raises \exception{TypeError} if the object is not a Unicode object.
794The C variable may also be declared as \ctype{PyObject *}.
795
796\item[\samp{t\#} (read-only character buffer) {[char *, int]}]
797Like \samp{s\#}, but accepts any object which implements the read-only
798buffer interface. The \ctype{char *} variable is set to point to the
799first byte of the buffer, and the \ctype{int} is set to the length of
800the buffer. Only single-segment buffer objects are accepted;
801\exception{TypeError} is raised for all others.
802
803\item[\samp{w} (read-write character buffer) {[char *]}]
804Similar to \samp{s}, but accepts any object which implements the
805read-write buffer interface. The caller must determine the length of
806the buffer by other means, or use \samp{w\#} instead. Only
807single-segment buffer objects are accepted; \exception{TypeError} is
808raised for all others.
809
810\item[\samp{w\#} (read-write character buffer) {[char *, int]}]
811Like \samp{s\#}, but accepts any object which implements the
812read-write buffer interface. The \ctype{char *} variable is set to
813point to the first byte of the buffer, and the \ctype{int} is set to
814the length of the buffer. Only single-segment buffer objects are
815accepted; \exception{TypeError} is raised for all others.
816
817\item[\samp{(\var{items})} (tuple) {[\var{matching-items}]}]
818The object must be a Python sequence whose length is the number of
819format units in \var{items}. The C arguments must correspond to the
820individual format units in \var{items}. Format units for sequences
821may be nested.
822
823\strong{Note:} Prior to Python version 1.5.2, this format specifier
824only accepted a tuple containing the individual parameters, not an
825arbitrary sequence. Code which previously caused
826\exception{TypeError} to be raised here may now proceed without an
827exception. This is not expected to be a problem for existing code.
828
829\end{description}
830
831It is possible to pass Python long integers where integers are
832requested; however no proper range checking is done --- the most
833significant bits are silently truncated when the receiving field is
834too small to receive the value (actually, the semantics are inherited
835from downcasts in C --- your mileage may vary).
836
837A few other characters have a meaning in a format string. These may
838not occur inside nested parentheses. They are:
839
840\begin{description}
841
842\item[\samp{|}]
843Indicates that the remaining arguments in the Python argument list are
844optional. The C variables corresponding to optional arguments should
845be initialized to their default value --- when an optional argument is
846not specified, \cfunction{PyArg_ParseTuple()} does not touch the contents
847of the corresponding C variable(s).
848
849\item[\samp{:}]
850The list of format units ends here; the string after the colon is used
851as the function name in error messages (the ``associated value'' of
852the exception that \cfunction{PyArg_ParseTuple()} raises).
853
854\item[\samp{;}]
855The list of format units ends here; the string after the semicolon is
856used as the error message \emph{instead} of the default error message.
857Clearly, \samp{:} and \samp{;} mutually exclude each other.
858
859\end{description}
860
861Some example calls:
862
863\begin{verbatim}
864 int ok;
865 int i, j;
866 long k, l;
867 char *s;
868 int size;
869
870 ok = PyArg_ParseTuple(args, ""); /* No arguments */
871 /* Python call: f() */
872\end{verbatim}
873
874\begin{verbatim}
875 ok = PyArg_ParseTuple(args, "s", &s); /* A string */
876 /* Possible Python call: f('whoops!') */
877\end{verbatim}
878
879\begin{verbatim}
880 ok = PyArg_ParseTuple(args, "lls", &k, &l, &s); /* Two longs and a string */
881 /* Possible Python call: f(1, 2, 'three') */
882\end{verbatim}
883
884\begin{verbatim}
885 ok = PyArg_ParseTuple(args, "(ii)s#", &i, &j, &s, &size);
886 /* A pair of ints and a string, whose size is also returned */
887 /* Possible Python call: f((1, 2), 'three') */
888\end{verbatim}
889
890\begin{verbatim}
891 {
892 char *file;
893 char *mode = "r";
894 int bufsize = 0;
895 ok = PyArg_ParseTuple(args, "s|si", &file, &mode, &bufsize);
896 /* A string, and optionally another string and an integer */
897 /* Possible Python calls:
898 f('spam')
899 f('spam', 'w')
900 f('spam', 'wb', 100000) */
901 }
902\end{verbatim}
903
904\begin{verbatim}
905 {
906 int left, top, right, bottom, h, v;
907 ok = PyArg_ParseTuple(args, "((ii)(ii))(ii)",
908 &left, &top, &right, &bottom, &h, &v);
909 /* A rectangle and a point */
910 /* Possible Python call:
911 f(((0, 0), (400, 300)), (10, 10)) */
912 }
913\end{verbatim}
914
915\begin{verbatim}
916 {
917 Py_complex c;
918 ok = PyArg_ParseTuple(args, "D:myfunction", &c);
919 /* a complex, also providing a function name for errors */
920 /* Possible Python call: myfunction(1+2j) */
921 }
922\end{verbatim}
923
924
925\section{Keyword Parameters for Extension Functions
926 \label{parseTupleAndKeywords}}
927
928The \cfunction{PyArg_ParseTupleAndKeywords()} function is declared as
929follows:
930
931\begin{verbatim}
932int PyArg_ParseTupleAndKeywords(PyObject *arg, PyObject *kwdict,
933 char *format, char **kwlist, ...);
934\end{verbatim}
935
936The \var{arg} and \var{format} parameters are identical to those of the
937\cfunction{PyArg_ParseTuple()} function. The \var{kwdict} parameter
938is the dictionary of keywords received as the third parameter from the
939Python runtime. The \var{kwlist} parameter is a \NULL{}-terminated
940list of strings which identify the parameters; the names are matched
941with the type information from \var{format} from left to right. On
942success, \cfunction{PyArg_ParseTupleAndKeywords()} returns true,
943otherwise it returns false and raises an appropriate exception.
944
945\strong{Note:} Nested tuples cannot be parsed when using keyword
946arguments! Keyword parameters passed in which are not present in the
947\var{kwlist} will cause \exception{TypeError} to be raised.
948
949Here is an example module which uses keywords, based on an example by
950Geoff Philbrick (\email{philbrick@hks.com}):%
951\index{Philbrick, Geoff}
952
953\begin{verbatim}
954#include <stdio.h>
955#include "Python.h"
956
957static PyObject *
958keywdarg_parrot(self, args, keywds)
959 PyObject *self;
960 PyObject *args;
961 PyObject *keywds;
962{
963 int voltage;
964 char *state = "a stiff";
965 char *action = "voom";
966 char *type = "Norwegian Blue";
967
968 static char *kwlist[] = {"voltage", "state", "action", "type", NULL};
969
970 if (!PyArg_ParseTupleAndKeywords(args, keywds, "i|sss", kwlist,
971 &voltage, &state, &action, &type))
972 return NULL;
973
974 printf("-- This parrot wouldn't %s if you put %i Volts through it.\n",
975 action, voltage);
976 printf("-- Lovely plumage, the %s -- It's %s!\n", type, state);
977
978 Py_INCREF(Py_None);
979
980 return Py_None;
981}
982
983static PyMethodDef keywdarg_methods[] = {
984 /* The cast of the function is necessary since PyCFunction values
985 * only take two PyObject* parameters, and keywdarg_parrot() takes
986 * three.
987 */
988 {"parrot", (PyCFunction)keywdarg_parrot, METH_VARARGS|METH_KEYWORDS},
989 {NULL, NULL} /* sentinel */
990};
991
992void
993initkeywdarg()
994{
995 /* Create the module and add the functions */
996 Py_InitModule("keywdarg", keywdarg_methods);
997}
998\end{verbatim}
999
1000
1001\section{Building Arbitrary Values
1002 \label{buildValue}}
1003
1004This function is the counterpart to \cfunction{PyArg_ParseTuple()}. It is
1005declared as follows:
1006
1007\begin{verbatim}
1008PyObject *Py_BuildValue(char *format, ...);
1009\end{verbatim}
1010
1011It recognizes a set of format units similar to the ones recognized by
1012\cfunction{PyArg_ParseTuple()}, but the arguments (which are input to the
1013function, not output) must not be pointers, just values. It returns a
1014new Python object, suitable for returning from a C function called
1015from Python.
1016
1017One difference with \cfunction{PyArg_ParseTuple()}: while the latter
1018requires its first argument to be a tuple (since Python argument lists
1019are always represented as tuples internally),
1020\cfunction{Py_BuildValue()} does not always build a tuple. It builds
1021a tuple only if its format string contains two or more format units.
1022If the format string is empty, it returns \code{None}; if it contains
1023exactly one format unit, it returns whatever object is described by
1024that format unit. To force it to return a tuple of size 0 or one,
1025parenthesize the format string.
1026
1027When memory buffers are passed as parameters to supply data to build
1028objects, as for the \samp{s} and \samp{s\#} formats, the required data
1029is copied. Buffers provided by the caller are never referenced by the
1030objects created by \cfunction{Py_BuildValue()}. In other words, if
1031your code invokes \cfunction{malloc()} and passes the allocated memory
1032to \cfunction{Py_BuildValue()}, your code is responsible for
1033calling \cfunction{free()} for that memory once
1034\cfunction{Py_BuildValue()} returns.
1035
1036In the following description, the quoted form is the format unit; the
1037entry in (round) parentheses is the Python object type that the format
1038unit will return; and the entry in [square] brackets is the type of
1039the C value(s) to be passed.
1040
1041The characters space, tab, colon and comma are ignored in format
1042strings (but not within format units such as \samp{s\#}). This can be
1043used to make long format strings a tad more readable.
1044
1045\begin{description}
1046
1047\item[\samp{s} (string) {[char *]}]
1048Convert a null-terminated C string to a Python object. If the C
1049string pointer is \NULL{}, \code{None} is used.
1050
1051\item[\samp{s\#} (string) {[char *, int]}]
1052Convert a C string and its length to a Python object. If the C string
1053pointer is \NULL{}, the length is ignored and \code{None} is
1054returned.
1055
1056\item[\samp{z} (string or \code{None}) {[char *]}]
1057Same as \samp{s}.
1058
1059\item[\samp{z\#} (string or \code{None}) {[char *, int]}]
1060Same as \samp{s\#}.
1061
1062\item[\samp{u} (Unicode string) {[Py_UNICODE *]}]
1063Convert a null-terminated buffer of Unicode (UCS-2) data to a Python
1064Unicode object. If the Unicode buffer pointer is \NULL,
1065\code{None} is returned.
1066
1067\item[\samp{u\#} (Unicode string) {[Py_UNICODE *, int]}]
1068Convert a Unicode (UCS-2) data buffer and its length to a Python
1069Unicode object. If the Unicode buffer pointer is \NULL, the length
1070is ignored and \code{None} is returned.
1071
1072\item[\samp{i} (integer) {[int]}]
1073Convert a plain C \ctype{int} to a Python integer object.
1074
1075\item[\samp{b} (integer) {[char]}]
1076Same as \samp{i}.
1077
1078\item[\samp{h} (integer) {[short int]}]
1079Same as \samp{i}.
1080
1081\item[\samp{l} (integer) {[long int]}]
1082Convert a C \ctype{long int} to a Python integer object.
1083
1084\item[\samp{c} (string of length 1) {[char]}]
1085Convert a C \ctype{int} representing a character to a Python string of
1086length 1.
1087
1088\item[\samp{d} (float) {[double]}]
1089Convert a C \ctype{double} to a Python floating point number.
1090
1091\item[\samp{f} (float) {[float]}]
1092Same as \samp{d}.
1093
1094\item[\samp{D} (complex) {[Py_complex *]}]
1095Convert a C \ctype{Py_complex} structure to a Python complex number.
1096
1097\item[\samp{O} (object) {[PyObject *]}]
1098Pass a Python object untouched (except for its reference count, which
1099is incremented by one). If the object passed in is a \NULL{}
1100pointer, it is assumed that this was caused because the call producing
1101the argument found an error and set an exception. Therefore,
1102\cfunction{Py_BuildValue()} will return \NULL{} but won't raise an
1103exception. If no exception has been raised yet,
1104\cdata{PyExc_SystemError} is set.
1105
1106\item[\samp{S} (object) {[PyObject *]}]
1107Same as \samp{O}.
1108
1109\item[\samp{U} (object) {[PyObject *]}]
1110Same as \samp{O}.
1111
1112\item[\samp{N} (object) {[PyObject *]}]
1113Same as \samp{O}, except it doesn't increment the reference count on
1114the object. Useful when the object is created by a call to an object
1115constructor in the argument list.
1116
1117\item[\samp{O\&} (object) {[\var{converter}, \var{anything}]}]
1118Convert \var{anything} to a Python object through a \var{converter}
1119function. The function is called with \var{anything} (which should be
1120compatible with \ctype{void *}) as its argument and should return a
1121``new'' Python object, or \NULL{} if an error occurred.
1122
1123\item[\samp{(\var{items})} (tuple) {[\var{matching-items}]}]
1124Convert a sequence of C values to a Python tuple with the same number
1125of items.
1126
1127\item[\samp{[\var{items}]} (list) {[\var{matching-items}]}]
1128Convert a sequence of C values to a Python list with the same number
1129of items.
1130
1131\item[\samp{\{\var{items}\}} (dictionary) {[\var{matching-items}]}]
1132Convert a sequence of C values to a Python dictionary. Each pair of
1133consecutive C values adds one item to the dictionary, serving as key
1134and value, respectively.
1135
1136\end{description}
1137
1138If there is an error in the format string, the
1139\cdata{PyExc_SystemError} exception is raised and \NULL{} returned.
1140
1141Examples (to the left the call, to the right the resulting Python value):
1142
1143\begin{verbatim}
1144 Py_BuildValue("") None
1145 Py_BuildValue("i", 123) 123
1146 Py_BuildValue("iii", 123, 456, 789) (123, 456, 789)
1147 Py_BuildValue("s", "hello") 'hello'
1148 Py_BuildValue("ss", "hello", "world") ('hello', 'world')
1149 Py_BuildValue("s#", "hello", 4) 'hell'
1150 Py_BuildValue("()") ()
1151 Py_BuildValue("(i)", 123) (123,)
1152 Py_BuildValue("(ii)", 123, 456) (123, 456)
1153 Py_BuildValue("(i,i)", 123, 456) (123, 456)
1154 Py_BuildValue("[i,i]", 123, 456) [123, 456]
1155 Py_BuildValue("{s:i,s:i}",
1156 "abc", 123, "def", 456) {'abc': 123, 'def': 456}
1157 Py_BuildValue("((ii)(ii)) (ii)",
1158 1, 2, 3, 4, 5, 6) (((1, 2), (3, 4)), (5, 6))
1159\end{verbatim}
1160
1161
1162\section{Reference Counts
1163 \label{refcounts}}
1164
1165In languages like C or \Cpp{}, the programmer is responsible for
1166dynamic allocation and deallocation of memory on the heap. In C,
1167this is done using the functions \cfunction{malloc()} and
1168\cfunction{free()}. In \Cpp{}, the operators \keyword{new} and
1169\keyword{delete} are used with essentially the same meaning; they are
1170actually implemented using \cfunction{malloc()} and
1171\cfunction{free()}, so we'll restrict the following discussion to the
1172latter.
1173
1174Every block of memory allocated with \cfunction{malloc()} should
1175eventually be returned to the pool of available memory by exactly one
1176call to \cfunction{free()}. It is important to call
1177\cfunction{free()} at the right time. If a block's address is
1178forgotten but \cfunction{free()} is not called for it, the memory it
1179occupies cannot be reused until the program terminates. This is
1180called a \dfn{memory leak}. On the other hand, if a program calls
1181\cfunction{free()} for a block and then continues to use the block, it
1182creates a conflict with re-use of the block through another
1183\cfunction{malloc()} call. This is called \dfn{using freed memory}.
1184It has the same bad consequences as referencing uninitialized data ---
1185core dumps, wrong results, mysterious crashes.
1186
1187Common causes of memory leaks are unusual paths through the code. For
1188instance, a function may allocate a block of memory, do some
1189calculation, and then free the block again. Now a change in the
1190requirements for the function may add a test to the calculation that
1191detects an error condition and can return prematurely from the
1192function. It's easy to forget to free the allocated memory block when
1193taking this premature exit, especially when it is added later to the
1194code. Such leaks, once introduced, often go undetected for a long
1195time: the error exit is taken only in a small fraction of all calls,
1196and most modern machines have plenty of virtual memory, so the leak
1197only becomes apparent in a long-running process that uses the leaking
1198function frequently. Therefore, it's important to prevent leaks from
1199happening by having a coding convention or strategy that minimizes
1200this kind of errors.
1201
1202Since Python makes heavy use of \cfunction{malloc()} and
1203\cfunction{free()}, it needs a strategy to avoid memory leaks as well
1204as the use of freed memory. The chosen method is called
1205\dfn{reference counting}. The principle is simple: every object
1206contains a counter, which is incremented when a reference to the
1207object is stored somewhere, and which is decremented when a reference
1208to it is deleted. When the counter reaches zero, the last reference
1209to the object has been deleted and the object is freed.
1210
1211An alternative strategy is called \dfn{automatic garbage collection}.
1212(Sometimes, reference counting is also referred to as a garbage
1213collection strategy, hence my use of ``automatic'' to distinguish the
1214two.) The big advantage of automatic garbage collection is that the
1215user doesn't need to call \cfunction{free()} explicitly. (Another claimed
1216advantage is an improvement in speed or memory usage --- this is no
1217hard fact however.) The disadvantage is that for C, there is no
1218truly portable automatic garbage collector, while reference counting
1219can be implemented portably (as long as the functions \cfunction{malloc()}
1220and \cfunction{free()} are available --- which the C Standard guarantees).
1221Maybe some day a sufficiently portable automatic garbage collector
1222will be available for C. Until then, we'll have to live with
1223reference counts.
1224
1225\subsection{Reference Counting in Python
1226 \label{refcountsInPython}}
1227
1228There are two macros, \code{Py_INCREF(x)} and \code{Py_DECREF(x)},
1229which handle the incrementing and decrementing of the reference count.
1230\cfunction{Py_DECREF()} also frees the object when the count reaches zero.
1231For flexibility, it doesn't call \cfunction{free()} directly --- rather, it
1232makes a call through a function pointer in the object's \dfn{type
1233object}. For this purpose (and others), every object also contains a
1234pointer to its type object.
1235
1236The big question now remains: when to use \code{Py_INCREF(x)} and
1237\code{Py_DECREF(x)}? Let's first introduce some terms. Nobody
1238``owns'' an object; however, you can \dfn{own a reference} to an
1239object. An object's reference count is now defined as the number of
1240owned references to it. The owner of a reference is responsible for
1241calling \cfunction{Py_DECREF()} when the reference is no longer
1242needed. Ownership of a reference can be transferred. There are three
1243ways to dispose of an owned reference: pass it on, store it, or call
1244\cfunction{Py_DECREF()}. Forgetting to dispose of an owned reference
1245creates a memory leak.
1246
1247It is also possible to \dfn{borrow}\footnote{The metaphor of
1248``borrowing'' a reference is not completely correct: the owner still
1249has a copy of the reference.} a reference to an object. The borrower
1250of a reference should not call \cfunction{Py_DECREF()}. The borrower must
1251not hold on to the object longer than the owner from which it was
1252borrowed. Using a borrowed reference after the owner has disposed of
1253it risks using freed memory and should be avoided
1254completely.\footnote{Checking that the reference count is at least 1
1255\strong{does not work} --- the reference count itself could be in
1256freed memory and may thus be reused for another object!}
1257
1258The advantage of borrowing over owning a reference is that you don't
1259need to take care of disposing of the reference on all possible paths
1260through the code --- in other words, with a borrowed reference you
1261don't run the risk of leaking when a premature exit is taken. The
1262disadvantage of borrowing over leaking is that there are some subtle
1263situations where in seemingly correct code a borrowed reference can be
1264used after the owner from which it was borrowed has in fact disposed
1265of it.
1266
1267A borrowed reference can be changed into an owned reference by calling
1268\cfunction{Py_INCREF()}. This does not affect the status of the owner from
1269which the reference was borrowed --- it creates a new owned reference,
1270and gives full owner responsibilities (the new owner must
1271dispose of the reference properly, as well as the previous owner).
1272
1273
1274\subsection{Ownership Rules
1275 \label{ownershipRules}}
1276
1277Whenever an object reference is passed into or out of a function, it
1278is part of the function's interface specification whether ownership is
1279transferred with the reference or not.
1280
1281Most functions that return a reference to an object pass on ownership
1282with the reference. In particular, all functions whose function it is
1283to create a new object, such as \cfunction{PyInt_FromLong()} and
1284\cfunction{Py_BuildValue()}, pass ownership to the receiver. Even if in
1285fact, in some cases, you don't receive a reference to a brand new
1286object, you still receive ownership of the reference. For instance,
1287\cfunction{PyInt_FromLong()} maintains a cache of popular values and can
1288return a reference to a cached item.
1289
1290Many functions that extract objects from other objects also transfer
1291ownership with the reference, for instance
1292\cfunction{PyObject_GetAttrString()}. The picture is less clear, here,
1293however, since a few common routines are exceptions:
1294\cfunction{PyTuple_GetItem()}, \cfunction{PyList_GetItem()},
1295\cfunction{PyDict_GetItem()}, and \cfunction{PyDict_GetItemString()}
1296all return references that you borrow from the tuple, list or
1297dictionary.
1298
1299The function \cfunction{PyImport_AddModule()} also returns a borrowed
1300reference, even though it may actually create the object it returns:
1301this is possible because an owned reference to the object is stored in
1302\code{sys.modules}.
1303
1304When you pass an object reference into another function, in general,
1305the function borrows the reference from you --- if it needs to store
1306it, it will use \cfunction{Py_INCREF()} to become an independent
1307owner. There are exactly two important exceptions to this rule:
1308\cfunction{PyTuple_SetItem()} and \cfunction{PyList_SetItem()}. These
1309functions take over ownership of the item passed to them --- even if
1310they fail! (Note that \cfunction{PyDict_SetItem()} and friends don't
1311take over ownership --- they are ``normal.'')
1312
1313When a C function is called from Python, it borrows references to its
1314arguments from the caller. The caller owns a reference to the object,
1315so the borrowed reference's lifetime is guaranteed until the function
1316returns. Only when such a borrowed reference must be stored or passed
1317on, it must be turned into an owned reference by calling
1318\cfunction{Py_INCREF()}.
1319
1320The object reference returned from a C function that is called from
1321Python must be an owned reference --- ownership is tranferred from the
1322function to its caller.
1323
1324
1325\subsection{Thin Ice
1326 \label{thinIce}}
1327
1328There are a few situations where seemingly harmless use of a borrowed
1329reference can lead to problems. These all have to do with implicit
1330invocations of the interpreter, which can cause the owner of a
1331reference to dispose of it.
1332
1333The first and most important case to know about is using
1334\cfunction{Py_DECREF()} on an unrelated object while borrowing a
1335reference to a list item. For instance:
1336
1337\begin{verbatim}
1338bug(PyObject *list) {
1339 PyObject *item = PyList_GetItem(list, 0);
1340
1341 PyList_SetItem(list, 1, PyInt_FromLong(0L));
1342 PyObject_Print(item, stdout, 0); /* BUG! */
1343}
1344\end{verbatim}
1345
1346This function first borrows a reference to \code{list[0]}, then
1347replaces \code{list[1]} with the value \code{0}, and finally prints
1348the borrowed reference. Looks harmless, right? But it's not!
1349
1350Let's follow the control flow into \cfunction{PyList_SetItem()}. The list
1351owns references to all its items, so when item 1 is replaced, it has
1352to dispose of the original item 1. Now let's suppose the original
1353item 1 was an instance of a user-defined class, and let's further
1354suppose that the class defined a \method{__del__()} method. If this
1355class instance has a reference count of 1, disposing of it will call
1356its \method{__del__()} method.
1357
1358Since it is written in Python, the \method{__del__()} method can execute
1359arbitrary Python code. Could it perhaps do something to invalidate
1360the reference to \code{item} in \cfunction{bug()}? You bet! Assuming
1361that the list passed into \cfunction{bug()} is accessible to the
1362\method{__del__()} method, it could execute a statement to the effect of
1363\samp{del list[0]}, and assuming this was the last reference to that
1364object, it would free the memory associated with it, thereby
1365invalidating \code{item}.
1366
1367The solution, once you know the source of the problem, is easy:
1368temporarily increment the reference count. The correct version of the
1369function reads:
1370
1371\begin{verbatim}
1372no_bug(PyObject *list) {
1373 PyObject *item = PyList_GetItem(list, 0);
1374
1375 Py_INCREF(item);
1376 PyList_SetItem(list, 1, PyInt_FromLong(0L));
1377 PyObject_Print(item, stdout, 0);
1378 Py_DECREF(item);
1379}
1380\end{verbatim}
1381
1382This is a true story. An older version of Python contained variants
1383of this bug and someone spent a considerable amount of time in a C
1384debugger to figure out why his \method{__del__()} methods would fail...
1385
1386The second case of problems with a borrowed reference is a variant
1387involving threads. Normally, multiple threads in the Python
1388interpreter can't get in each other's way, because there is a global
1389lock protecting Python's entire object space. However, it is possible
1390to temporarily release this lock using the macro
1391\code{Py_BEGIN_ALLOW_THREADS}, and to re-acquire it using
1392\code{Py_END_ALLOW_THREADS}. This is common around blocking I/O
1393calls, to let other threads use the processor while waiting for the I/O to
1394complete. Obviously, the following function has the same problem as
1395the previous one:
1396
1397\begin{verbatim}
1398bug(PyObject *list) {
1399 PyObject *item = PyList_GetItem(list, 0);
1400 Py_BEGIN_ALLOW_THREADS
1401 ...some blocking I/O call...
1402 Py_END_ALLOW_THREADS
1403 PyObject_Print(item, stdout, 0); /* BUG! */
1404}
1405\end{verbatim}
1406
1407
1408\subsection{NULL Pointers
1409 \label{nullPointers}}
1410
1411In general, functions that take object references as arguments do not
1412expect you to pass them \NULL{} pointers, and will dump core (or
1413cause later core dumps) if you do so. Functions that return object
1414references generally return \NULL{} only to indicate that an
1415exception occurred. The reason for not testing for \NULL{}
1416arguments is that functions often pass the objects they receive on to
1417other function --- if each function were to test for \NULL{},
1418there would be a lot of redundant tests and the code would run more
1419slowly.
1420
1421It is better to test for \NULL{} only at the ``source:'' when a
1422pointer that may be \NULL{} is received, for example, from
1423\cfunction{malloc()} or from a function that may raise an exception.
1424
1425The macros \cfunction{Py_INCREF()} and \cfunction{Py_DECREF()}
1426do not check for \NULL{} pointers --- however, their variants
1427\cfunction{Py_XINCREF()} and \cfunction{Py_XDECREF()} do.
1428
1429The macros for checking for a particular object type
1430(\code{Py\var{type}_Check()}) don't check for \NULL{} pointers ---
1431again, there is much code that calls several of these in a row to test
1432an object against various different expected types, and this would
1433generate redundant tests. There are no variants with \NULL{}
1434checking.
1435
1436The C function calling mechanism guarantees that the argument list
1437passed to C functions (\code{args} in the examples) is never
1438\NULL{} --- in fact it guarantees that it is always a tuple.\footnote{
1439These guarantees don't hold when you use the ``old'' style
1440calling convention --- this is still found in much existing code.}
1441
1442It is a severe error to ever let a \NULL{} pointer ``escape'' to
1443the Python user.
1444
1445% Frank Stajano:
1446% A pedagogically buggy example, along the lines of the previous listing,
1447% would be helpful here -- showing in more concrete terms what sort of
1448% actions could cause the problem. I can't very well imagine it from the
1449% description.
1450
1451
1452\section{Writing Extensions in \Cpp{}
1453 \label{cplusplus}}
1454
1455It is possible to write extension modules in \Cpp{}. Some restrictions
1456apply. If the main program (the Python interpreter) is compiled and
1457linked by the C compiler, global or static objects with constructors
1458cannot be used. This is not a problem if the main program is linked
1459by the \Cpp{} compiler. Functions that will be called by the
1460Python interpreter (in particular, module initalization functions)
1461have to be declared using \code{extern "C"}.
1462It is unnecessary to enclose the Python header files in
1463\code{extern "C" \{...\}} --- they use this form already if the symbol
1464\samp{__cplusplus} is defined (all recent \Cpp{} compilers define this
1465symbol).
1466
1467
1468\section{Providing a C API for an Extension Module
1469 \label{using-cobjects}}
1470\sectionauthor{Konrad Hinsen}{hinsen@cnrs-orleans.fr}
1471
1472Many extension modules just provide new functions and types to be
1473used from Python, but sometimes the code in an extension module can
1474be useful for other extension modules. For example, an extension
1475module could implement a type ``collection'' which works like lists
1476without order. Just like the standard Python list type has a C API
1477which permits extension modules to create and manipulate lists, this
1478new collection type should have a set of C functions for direct
1479manipulation from other extension modules.
1480
1481At first sight this seems easy: just write the functions (without
1482declaring them \keyword{static}, of course), provide an appropriate
1483header file, and document the C API. And in fact this would work if
1484all extension modules were always linked statically with the Python
1485interpreter. When modules are used as shared libraries, however, the
1486symbols defined in one module may not be visible to another module.
1487The details of visibility depend on the operating system; some systems
1488use one global namespace for the Python interpreter and all extension
1489modules (Windows, for example), whereas others require an explicit
1490list of imported symbols at module link time (AIX is one example), or
1491offer a choice of different strategies (most Unices). And even if
1492symbols are globally visible, the module whose functions one wishes to
1493call might not have been loaded yet!
1494
1495Portability therefore requires not to make any assumptions about
1496symbol visibility. This means that all symbols in extension modules
1497should be declared \keyword{static}, except for the module's
1498initialization function, in order to avoid name clashes with other
1499extension modules (as discussed in section~\ref{methodTable}). And it
1500means that symbols that \emph{should} be accessible from other
1501extension modules must be exported in a different way.
1502
1503Python provides a special mechanism to pass C-level information
1504(pointers) from one extension module to another one: CObjects.
1505A CObject is a Python data type which stores a pointer (\ctype{void
1506*}). CObjects can only be created and accessed via their C API, but
1507they can be passed around like any other Python object. In particular,
1508they can be assigned to a name in an extension module's namespace.
1509Other extension modules can then import this module, retrieve the
1510value of this name, and then retrieve the pointer from the CObject.
1511
1512There are many ways in which CObjects can be used to export the C API
1513of an extension module. Each name could get its own CObject, or all C
1514API pointers could be stored in an array whose address is published in
1515a CObject. And the various tasks of storing and retrieving the pointers
1516can be distributed in different ways between the module providing the
1517code and the client modules.
1518
1519The following example demonstrates an approach that puts most of the
1520burden on the writer of the exporting module, which is appropriate
1521for commonly used library modules. It stores all C API pointers
1522(just one in the example!) in an array of \ctype{void} pointers which
1523becomes the value of a CObject. The header file corresponding to
1524the module provides a macro that takes care of importing the module
1525and retrieving its C API pointers; client modules only have to call
1526this macro before accessing the C API.
1527
1528The exporting module is a modification of the \module{spam} module from
1529section~\ref{simpleExample}. The function \function{spam.system()}
1530does not call the C library function \cfunction{system()} directly,
1531but a function \cfunction{PySpam_System()}, which would of course do
1532something more complicated in reality (such as adding ``spam'' to
1533every command). This function \cfunction{PySpam_System()} is also
1534exported to other extension modules.
1535
1536The function \cfunction{PySpam_System()} is a plain C function,
1537declared \keyword{static} like everything else:
1538
1539\begin{verbatim}
1540static int
1541PySpam_System(command)
1542 char *command;
1543{
1544 return system(command);
1545}
1546\end{verbatim}
1547
1548The function \cfunction{spam_system()} is modified in a trivial way:
1549
1550\begin{verbatim}
1551static PyObject *
1552spam_system(self, args)
1553 PyObject *self;
1554 PyObject *args;
1555{
1556 char *command;
1557 int sts;
1558
1559 if (!PyArg_ParseTuple(args, "s", &command))
1560 return NULL;
1561 sts = PySpam_System(command);
1562 return Py_BuildValue("i", sts);
1563}
1564\end{verbatim}
1565
1566In the beginning of the module, right after the line
1567
1568\begin{verbatim}
1569#include "Python.h"
1570\end{verbatim}
1571
1572two more lines must be added:
1573
1574\begin{verbatim}
1575#define SPAM_MODULE
1576#include "spammodule.h"
1577\end{verbatim}
1578
1579The \code{\#define} is used to tell the header file that it is being
1580included in the exporting module, not a client module. Finally,
1581the module's initialization function must take care of initializing
1582the C API pointer array:
1583
1584\begin{verbatim}
1585void
1586initspam()
1587{
1588 PyObject *m;
1589 static void *PySpam_API[PySpam_API_pointers];
1590 PyObject *c_api_object;
1591
1592 m = Py_InitModule("spam", SpamMethods);
1593
1594 /* Initialize the C API pointer array */
1595 PySpam_API[PySpam_System_NUM] = (void *)PySpam_System;
1596
1597 /* Create a CObject containing the API pointer array's address */
1598 c_api_object = PyCObject_FromVoidPtr((void *)PySpam_API, NULL);
1599
1600 if (c_api_object != NULL) {
1601 /* Create a name for this object in the module's namespace */
1602 PyObject *d = PyModule_GetDict(m);
1603
1604 PyDict_SetItemString(d, "_C_API", c_api_object);
1605 Py_DECREF(c_api_object);
1606 }
1607}
1608\end{verbatim}
1609
1610Note that \code{PySpam_API} is declared \code{static}; otherwise
1611the pointer array would disappear when \code{initspam} terminates!
1612
1613The bulk of the work is in the header file \file{spammodule.h},
1614which looks like this:
1615
1616\begin{verbatim}
1617#ifndef Py_SPAMMODULE_H
1618#define Py_SPAMMODULE_H
1619#ifdef __cplusplus
1620extern "C" {
1621#endif
1622
1623/* Header file for spammodule */
1624
1625/* C API functions */
1626#define PySpam_System_NUM 0
1627#define PySpam_System_RETURN int
1628#define PySpam_System_PROTO (char *command)
1629
1630/* Total number of C API pointers */
1631#define PySpam_API_pointers 1
1632
1633
1634#ifdef SPAM_MODULE
1635/* This section is used when compiling spammodule.c */
1636
1637static PySpam_System_RETURN PySpam_System PySpam_System_PROTO;
1638
1639#else
1640/* This section is used in modules that use spammodule's API */
1641
1642static void **PySpam_API;
1643
1644#define PySpam_System \
1645 (*(PySpam_System_RETURN (*)PySpam_System_PROTO) PySpam_API[PySpam_System_NUM])
1646
1647#define import_spam() \
1648{ \
1649 PyObject *module = PyImport_ImportModule("spam"); \
1650 if (module != NULL) { \
1651 PyObject *module_dict = PyModule_GetDict(module); \
1652 PyObject *c_api_object = PyDict_GetItemString(module_dict, "_C_API"); \
1653 if (PyCObject_Check(c_api_object)) { \
1654 PySpam_API = (void **)PyCObject_AsVoidPtr(c_api_object); \
1655 } \
1656 } \
1657}
1658
1659#endif
1660
1661#ifdef __cplusplus
1662}
1663#endif
1664
1665#endif /* !defined(Py_SPAMMODULE_H */
1666\end{verbatim}
1667
1668All that a client module must do in order to have access to the
1669function \cfunction{PySpam_System()} is to call the function (or
1670rather macro) \cfunction{import_spam()} in its initialization
1671function:
1672
1673\begin{verbatim}
1674void
1675initclient()
1676{
1677 PyObject *m;
1678
1679 Py_InitModule("client", ClientMethods);
1680 import_spam();
1681}
1682\end{verbatim}
1683
1684The main disadvantage of this approach is that the file
1685\file{spammodule.h} is rather complicated. However, the
1686basic structure is the same for each function that is
1687exported, so it has to be learned only once.
1688
1689Finally it should be mentioned that CObjects offer additional
1690functionality, which is especially useful for memory allocation and
1691deallocation of the pointer stored in a CObject. The details
1692are described in the \citetitle[../api/api.html]{Python/C API
1693Reference Manual} in the section ``CObjects'' and in the
1694implementation of CObjects (files \file{Include/cobject.h} and
1695\file{Objects/cobject.c} in the Python source code distribution).