blob: d148ba8d42f1d085b1bb3d86f69324cc649a65da [file] [log] [blame]
Fred Drake3adf79e2001-10-12 19:01:43 +00001\chapter{Introduction \label{intro}}
2
3
4The Application Programmer's Interface to Python gives C and
5\Cpp{} programmers access to the Python interpreter at a variety of
6levels. The API is equally usable from \Cpp{}, but for brevity it is
7generally referred to as the Python/C API. There are two
8fundamentally different reasons for using the Python/C API. The first
9reason is to write \emph{extension modules} for specific purposes;
10these are C modules that extend the Python interpreter. This is
11probably the most common use. The second reason is to use Python as a
12component in a larger application; this technique is generally
13referred to as \dfn{embedding} Python in an application.
14
15Writing an extension module is a relatively well-understood process,
16where a ``cookbook'' approach works well. There are several tools
17that automate the process to some extent. While people have embedded
18Python in other applications since its early existence, the process of
19embedding Python is less straightforward than writing an extension.
20
21Many API functions are useful independent of whether you're embedding
22or extending Python; moreover, most applications that embed Python
23will need to provide a custom extension as well, so it's probably a
24good idea to become familiar with writing an extension before
25attempting to embed Python in a real application.
26
27
28\section{Include Files \label{includes}}
29
30All function, type and macro definitions needed to use the Python/C
31API are included in your code by the following line:
32
33\begin{verbatim}
34#include "Python.h"
35\end{verbatim}
36
37This implies inclusion of the following standard headers:
38\code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>},
39\code{<limits.h>}, and \code{<stdlib.h>} (if available).
40Since Python may define some pre-processor definitions which affect
41the standard headers on some systems, you must include \file{Python.h}
42before any standard headers are included.
43
44All user visible names defined by Python.h (except those defined by
45the included standard headers) have one of the prefixes \samp{Py} or
46\samp{_Py}. Names beginning with \samp{_Py} are for internal use by
47the Python implementation and should not be used by extension writers.
48Structure member names do not have a reserved prefix.
49
50\strong{Important:} user code should never define names that begin
51with \samp{Py} or \samp{_Py}. This confuses the reader, and
52jeopardizes the portability of the user code to future Python
53versions, which may define additional names beginning with one of
54these prefixes.
55
56The header files are typically installed with Python. On \UNIX, these
57are located in the directories
58\file{\envvar{prefix}/include/python\var{version}/} and
59\file{\envvar{exec_prefix}/include/python\var{version}/}, where
60\envvar{prefix} and \envvar{exec_prefix} are defined by the
61corresponding parameters to Python's \program{configure} script and
62\var{version} is \code{sys.version[:3]}. On Windows, the headers are
63installed in \file{\envvar{prefix}/include}, where \envvar{prefix} is
64the installation directory specified to the installer.
65
66To include the headers, place both directories (if different) on your
67compiler's search path for includes. Do \emph{not} place the parent
68directories on the search path and then use
69\samp{\#include <python\shortversion/Python.h>}; this will break on
70multi-platform builds since the platform independent headers under
71\envvar{prefix} include the platform specific headers from
72\envvar{exec_prefix}.
73
74\Cpp{} users should note that though the API is defined entirely using
75C, the header files do properly declare the entry points to be
76\code{extern "C"}, so there is no need to do anything special to use
77the API from \Cpp.
78
79
80\section{Objects, Types and Reference Counts \label{objects}}
81
82Most Python/C API functions have one or more arguments as well as a
83return value of type \ctype{PyObject*}. This type is a pointer
84to an opaque data type representing an arbitrary Python
85object. Since all Python object types are treated the same way by the
86Python language in most situations (e.g., assignments, scope rules,
87and argument passing), it is only fitting that they should be
88represented by a single C type. Almost all Python objects live on the
89heap: you never declare an automatic or static variable of type
90\ctype{PyObject}, only pointer variables of type \ctype{PyObject*} can
91be declared. The sole exception are the type objects\obindex{type};
92since these must never be deallocated, they are typically static
93\ctype{PyTypeObject} objects.
94
95All Python objects (even Python integers) have a \dfn{type} and a
96\dfn{reference count}. An object's type determines what kind of object
97it is (e.g., an integer, a list, or a user-defined function; there are
98many more as explained in the \citetitle[../ref/ref.html]{Python
99Reference Manual}). For each of the well-known types there is a macro
100to check whether an object is of that type; for instance,
101\samp{PyList_Check(\var{a})} is true if (and only if) the object
102pointed to by \var{a} is a Python list.
103
104
105\subsection{Reference Counts \label{refcounts}}
106
107The reference count is important because today's computers have a
108finite (and often severely limited) memory size; it counts how many
109different places there are that have a reference to an object. Such a
110place could be another object, or a global (or static) C variable, or
111a local variable in some C function. When an object's reference count
112becomes zero, the object is deallocated. If it contains references to
113other objects, their reference count is decremented. Those other
114objects may be deallocated in turn, if this decrement makes their
115reference count become zero, and so on. (There's an obvious problem
116with objects that reference each other here; for now, the solution is
117``don't do that.'')
118
119Reference counts are always manipulated explicitly. The normal way is
120to use the macro \cfunction{Py_INCREF()}\ttindex{Py_INCREF()} to
121increment an object's reference count by one, and
122\cfunction{Py_DECREF()}\ttindex{Py_DECREF()} to decrement it by
123one. The \cfunction{Py_DECREF()} macro is considerably more complex
124than the incref one, since it must check whether the reference count
125becomes zero and then cause the object's deallocator to be called.
126The deallocator is a function pointer contained in the object's type
127structure. The type-specific deallocator takes care of decrementing
128the reference counts for other objects contained in the object if this
129is a compound object type, such as a list, as well as performing any
130additional finalization that's needed. There's no chance that the
131reference count can overflow; at least as many bits are used to hold
132the reference count as there are distinct memory locations in virtual
133memory (assuming \code{sizeof(long) >= sizeof(char*)}). Thus, the
134reference count increment is a simple operation.
135
136It is not necessary to increment an object's reference count for every
137local variable that contains a pointer to an object. In theory, the
138object's reference count goes up by one when the variable is made to
139point to it and it goes down by one when the variable goes out of
140scope. However, these two cancel each other out, so at the end the
141reference count hasn't changed. The only real reason to use the
142reference count is to prevent the object from being deallocated as
143long as our variable is pointing to it. If we know that there is at
144least one other reference to the object that lives at least as long as
145our variable, there is no need to increment the reference count
146temporarily. An important situation where this arises is in objects
147that are passed as arguments to C functions in an extension module
148that are called from Python; the call mechanism guarantees to hold a
149reference to every argument for the duration of the call.
150
151However, a common pitfall is to extract an object from a list and
152hold on to it for a while without incrementing its reference count.
153Some other operation might conceivably remove the object from the
154list, decrementing its reference count and possible deallocating it.
155The real danger is that innocent-looking operations may invoke
156arbitrary Python code which could do this; there is a code path which
157allows control to flow back to the user from a \cfunction{Py_DECREF()},
158so almost any operation is potentially dangerous.
159
160A safe approach is to always use the generic operations (functions
161whose name begins with \samp{PyObject_}, \samp{PyNumber_},
162\samp{PySequence_} or \samp{PyMapping_}). These operations always
163increment the reference count of the object they return. This leaves
164the caller with the responsibility to call
165\cfunction{Py_DECREF()} when they are done with the result; this soon
166becomes second nature.
167
168
169\subsubsection{Reference Count Details \label{refcountDetails}}
170
171The reference count behavior of functions in the Python/C API is best
172explained in terms of \emph{ownership of references}. Note that we
173talk of owning references, never of owning objects; objects are always
174shared! When a function owns a reference, it has to dispose of it
175properly --- either by passing ownership on (usually to its caller) or
176by calling \cfunction{Py_DECREF()} or \cfunction{Py_XDECREF()}. When
177a function passes ownership of a reference on to its caller, the
178caller is said to receive a \emph{new} reference. When no ownership
179is transferred, the caller is said to \emph{borrow} the reference.
180Nothing needs to be done for a borrowed reference.
181
182Conversely, when a calling function passes it a reference to an
183object, there are two possibilities: the function \emph{steals} a
184reference to the object, or it does not. Few functions steal
185references; the two notable exceptions are
186\cfunction{PyList_SetItem()}\ttindex{PyList_SetItem()} and
187\cfunction{PyTuple_SetItem()}\ttindex{PyTuple_SetItem()}, which
188steal a reference to the item (but not to the tuple or list into which
189the item is put!). These functions were designed to steal a reference
190because of a common idiom for populating a tuple or list with newly
191created objects; for example, the code to create the tuple \code{(1,
1922, "three")} could look like this (forgetting about error handling for
193the moment; a better way to code this is shown below):
194
195\begin{verbatim}
196PyObject *t;
197
198t = PyTuple_New(3);
199PyTuple_SetItem(t, 0, PyInt_FromLong(1L));
200PyTuple_SetItem(t, 1, PyInt_FromLong(2L));
201PyTuple_SetItem(t, 2, PyString_FromString("three"));
202\end{verbatim}
203
204Incidentally, \cfunction{PyTuple_SetItem()} is the \emph{only} way to
205set tuple items; \cfunction{PySequence_SetItem()} and
206\cfunction{PyObject_SetItem()} refuse to do this since tuples are an
207immutable data type. You should only use
208\cfunction{PyTuple_SetItem()} for tuples that you are creating
209yourself.
210
211Equivalent code for populating a list can be written using
212\cfunction{PyList_New()} and \cfunction{PyList_SetItem()}. Such code
213can also use \cfunction{PySequence_SetItem()}; this illustrates the
214difference between the two (the extra \cfunction{Py_DECREF()} calls):
215
216\begin{verbatim}
217PyObject *l, *x;
218
219l = PyList_New(3);
220x = PyInt_FromLong(1L);
221PySequence_SetItem(l, 0, x); Py_DECREF(x);
222x = PyInt_FromLong(2L);
223PySequence_SetItem(l, 1, x); Py_DECREF(x);
224x = PyString_FromString("three");
225PySequence_SetItem(l, 2, x); Py_DECREF(x);
226\end{verbatim}
227
228You might find it strange that the ``recommended'' approach takes more
229code. However, in practice, you will rarely use these ways of
230creating and populating a tuple or list. There's a generic function,
231\cfunction{Py_BuildValue()}, that can create most common objects from
232C values, directed by a \dfn{format string}. For example, the
233above two blocks of code could be replaced by the following (which
234also takes care of the error checking):
235
236\begin{verbatim}
237PyObject *t, *l;
238
239t = Py_BuildValue("(iis)", 1, 2, "three");
240l = Py_BuildValue("[iis]", 1, 2, "three");
241\end{verbatim}
242
243It is much more common to use \cfunction{PyObject_SetItem()} and
244friends with items whose references you are only borrowing, like
245arguments that were passed in to the function you are writing. In
246that case, their behaviour regarding reference counts is much saner,
247since you don't have to increment a reference count so you can give a
248reference away (``have it be stolen''). For example, this function
249sets all items of a list (actually, any mutable sequence) to a given
250item:
251
252\begin{verbatim}
253int set_all(PyObject *target, PyObject *item)
254{
255 int i, n;
256
257 n = PyObject_Length(target);
258 if (n < 0)
259 return -1;
260 for (i = 0; i < n; i++) {
261 if (PyObject_SetItem(target, i, item) < 0)
262 return -1;
263 }
264 return 0;
265}
266\end{verbatim}
267\ttindex{set_all()}
268
269The situation is slightly different for function return values.
270While passing a reference to most functions does not change your
271ownership responsibilities for that reference, many functions that
272return a referece to an object give you ownership of the reference.
273The reason is simple: in many cases, the returned object is created
274on the fly, and the reference you get is the only reference to the
275object. Therefore, the generic functions that return object
276references, like \cfunction{PyObject_GetItem()} and
277\cfunction{PySequence_GetItem()}, always return a new reference (the
278caller becomes the owner of the reference).
279
280It is important to realize that whether you own a reference returned
281by a function depends on which function you call only --- \emph{the
282plumage} (the type of the type of the object passed as an
283argument to the function) \emph{doesn't enter into it!} Thus, if you
284extract an item from a list using \cfunction{PyList_GetItem()}, you
285don't own the reference --- but if you obtain the same item from the
286same list using \cfunction{PySequence_GetItem()} (which happens to
287take exactly the same arguments), you do own a reference to the
288returned object.
289
290Here is an example of how you could write a function that computes the
291sum of the items in a list of integers; once using
292\cfunction{PyList_GetItem()}\ttindex{PyList_GetItem()}, and once using
293\cfunction{PySequence_GetItem()}\ttindex{PySequence_GetItem()}.
294
295\begin{verbatim}
296long sum_list(PyObject *list)
297{
298 int i, n;
299 long total = 0;
300 PyObject *item;
301
302 n = PyList_Size(list);
303 if (n < 0)
304 return -1; /* Not a list */
305 for (i = 0; i < n; i++) {
306 item = PyList_GetItem(list, i); /* Can't fail */
307 if (!PyInt_Check(item)) continue; /* Skip non-integers */
308 total += PyInt_AsLong(item);
309 }
310 return total;
311}
312\end{verbatim}
313\ttindex{sum_list()}
314
315\begin{verbatim}
316long sum_sequence(PyObject *sequence)
317{
318 int i, n;
319 long total = 0;
320 PyObject *item;
321 n = PySequence_Length(sequence);
322 if (n < 0)
323 return -1; /* Has no length */
324 for (i = 0; i < n; i++) {
325 item = PySequence_GetItem(sequence, i);
326 if (item == NULL)
327 return -1; /* Not a sequence, or other failure */
328 if (PyInt_Check(item))
329 total += PyInt_AsLong(item);
330 Py_DECREF(item); /* Discard reference ownership */
331 }
332 return total;
333}
334\end{verbatim}
335\ttindex{sum_sequence()}
336
337
338\subsection{Types \label{types}}
339
340There are few other data types that play a significant role in
341the Python/C API; most are simple C types such as \ctype{int},
342\ctype{long}, \ctype{double} and \ctype{char*}. A few structure types
343are used to describe static tables used to list the functions exported
344by a module or the data attributes of a new object type, and another
345is used to describe the value of a complex number. These will
346be discussed together with the functions that use them.
347
348
349\section{Exceptions \label{exceptions}}
350
351The Python programmer only needs to deal with exceptions if specific
352error handling is required; unhandled exceptions are automatically
353propagated to the caller, then to the caller's caller, and so on, until
354they reach the top-level interpreter, where they are reported to the
355user accompanied by a stack traceback.
356
357For C programmers, however, error checking always has to be explicit.
358All functions in the Python/C API can raise exceptions, unless an
359explicit claim is made otherwise in a function's documentation. In
360general, when a function encounters an error, it sets an exception,
361discards any object references that it owns, and returns an
362error indicator --- usually \NULL{} or \code{-1}. A few functions
363return a Boolean true/false result, with false indicating an error.
364Very few functions return no explicit error indicator or have an
365ambiguous return value, and require explicit testing for errors with
366\cfunction{PyErr_Occurred()}\ttindex{PyErr_Occurred()}.
367
368Exception state is maintained in per-thread storage (this is
369equivalent to using global storage in an unthreaded application). A
370thread can be in one of two states: an exception has occurred, or not.
371The function \cfunction{PyErr_Occurred()} can be used to check for
372this: it returns a borrowed reference to the exception type object
373when an exception has occurred, and \NULL{} otherwise. There are a
374number of functions to set the exception state:
375\cfunction{PyErr_SetString()}\ttindex{PyErr_SetString()} is the most
376common (though not the most general) function to set the exception
377state, and \cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} clears the
378exception state.
379
380The full exception state consists of three objects (all of which can
381be \NULL): the exception type, the corresponding exception
382value, and the traceback. These have the same meanings as the Python
383\withsubitem{(in module sys)}{
384 \ttindex{exc_type}\ttindex{exc_value}\ttindex{exc_traceback}}
385objects \code{sys.exc_type}, \code{sys.exc_value}, and
386\code{sys.exc_traceback}; however, they are not the same: the Python
387objects represent the last exception being handled by a Python
388\keyword{try} \ldots\ \keyword{except} statement, while the C level
389exception state only exists while an exception is being passed on
390between C functions until it reaches the Python bytecode interpreter's
391main loop, which takes care of transferring it to \code{sys.exc_type}
392and friends.
393
394Note that starting with Python 1.5, the preferred, thread-safe way to
395access the exception state from Python code is to call the function
396\withsubitem{(in module sys)}{\ttindex{exc_info()}}
397\function{sys.exc_info()}, which returns the per-thread exception state
398for Python code. Also, the semantics of both ways to access the
399exception state have changed so that a function which catches an
400exception will save and restore its thread's exception state so as to
401preserve the exception state of its caller. This prevents common bugs
402in exception handling code caused by an innocent-looking function
403overwriting the exception being handled; it also reduces the often
404unwanted lifetime extension for objects that are referenced by the
405stack frames in the traceback.
406
407As a general principle, a function that calls another function to
408perform some task should check whether the called function raised an
409exception, and if so, pass the exception state on to its caller. It
410should discard any object references that it owns, and return an
411error indicator, but it should \emph{not} set another exception ---
412that would overwrite the exception that was just raised, and lose
413important information about the exact cause of the error.
414
415A simple example of detecting exceptions and passing them on is shown
416in the \cfunction{sum_sequence()}\ttindex{sum_sequence()} example
417above. It so happens that that example doesn't need to clean up any
418owned references when it detects an error. The following example
419function shows some error cleanup. First, to remind you why you like
420Python, we show the equivalent Python code:
421
422\begin{verbatim}
423def incr_item(dict, key):
424 try:
425 item = dict[key]
426 except KeyError:
427 item = 0
428 dict[key] = item + 1
429\end{verbatim}
430\ttindex{incr_item()}
431
432Here is the corresponding C code, in all its glory:
433
434\begin{verbatim}
435int incr_item(PyObject *dict, PyObject *key)
436{
437 /* Objects all initialized to NULL for Py_XDECREF */
438 PyObject *item = NULL, *const_one = NULL, *incremented_item = NULL;
439 int rv = -1; /* Return value initialized to -1 (failure) */
440
441 item = PyObject_GetItem(dict, key);
442 if (item == NULL) {
443 /* Handle KeyError only: */
444 if (!PyErr_ExceptionMatches(PyExc_KeyError))
445 goto error;
446
447 /* Clear the error and use zero: */
448 PyErr_Clear();
449 item = PyInt_FromLong(0L);
450 if (item == NULL)
451 goto error;
452 }
453 const_one = PyInt_FromLong(1L);
454 if (const_one == NULL)
455 goto error;
456
457 incremented_item = PyNumber_Add(item, const_one);
458 if (incremented_item == NULL)
459 goto error;
460
461 if (PyObject_SetItem(dict, key, incremented_item) < 0)
462 goto error;
463 rv = 0; /* Success */
464 /* Continue with cleanup code */
465
466 error:
467 /* Cleanup code, shared by success and failure path */
468
469 /* Use Py_XDECREF() to ignore NULL references */
470 Py_XDECREF(item);
471 Py_XDECREF(const_one);
472 Py_XDECREF(incremented_item);
473
474 return rv; /* -1 for error, 0 for success */
475}
476\end{verbatim}
477\ttindex{incr_item()}
478
479This example represents an endorsed use of the \keyword{goto} statement
480in C! It illustrates the use of
481\cfunction{PyErr_ExceptionMatches()}\ttindex{PyErr_ExceptionMatches()} and
482\cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} to
483handle specific exceptions, and the use of
484\cfunction{Py_XDECREF()}\ttindex{Py_XDECREF()} to
485dispose of owned references that may be \NULL{} (note the
486\character{X} in the name; \cfunction{Py_DECREF()} would crash when
487confronted with a \NULL{} reference). It is important that the
488variables used to hold owned references are initialized to \NULL{} for
489this to work; likewise, the proposed return value is initialized to
490\code{-1} (failure) and only set to success after the final call made
491is successful.
492
493
494\section{Embedding Python \label{embedding}}
495
496The one important task that only embedders (as opposed to extension
497writers) of the Python interpreter have to worry about is the
498initialization, and possibly the finalization, of the Python
499interpreter. Most functionality of the interpreter can only be used
500after the interpreter has been initialized.
501
502The basic initialization function is
503\cfunction{Py_Initialize()}\ttindex{Py_Initialize()}.
504This initializes the table of loaded modules, and creates the
505fundamental modules \module{__builtin__}\refbimodindex{__builtin__},
506\module{__main__}\refbimodindex{__main__}, \module{sys}\refbimodindex{sys},
507and \module{exceptions}.\refbimodindex{exceptions} It also initializes
508the module search path (\code{sys.path}).%
509\indexiii{module}{search}{path}
510\withsubitem{(in module sys)}{\ttindex{path}}
511
512\cfunction{Py_Initialize()} does not set the ``script argument list''
513(\code{sys.argv}). If this variable is needed by Python code that
514will be executed later, it must be set explicitly with a call to
515\code{PySys_SetArgv(\var{argc},
516\var{argv})}\ttindex{PySys_SetArgv()} subsequent to the call to
517\cfunction{Py_Initialize()}.
518
519On most systems (in particular, on \UNIX{} and Windows, although the
520details are slightly different),
521\cfunction{Py_Initialize()} calculates the module search path based
522upon its best guess for the location of the standard Python
523interpreter executable, assuming that the Python library is found in a
524fixed location relative to the Python interpreter executable. In
525particular, it looks for a directory named
526\file{lib/python\shortversion} relative to the parent directory where
527the executable named \file{python} is found on the shell command
528search path (the environment variable \envvar{PATH}).
529
530For instance, if the Python executable is found in
531\file{/usr/local/bin/python}, it will assume that the libraries are in
532\file{/usr/local/lib/python\shortversion}. (In fact, this particular path
533is also the ``fallback'' location, used when no executable file named
534\file{python} is found along \envvar{PATH}.) The user can override
535this behavior by setting the environment variable \envvar{PYTHONHOME},
536or insert additional directories in front of the standard path by
537setting \envvar{PYTHONPATH}.
538
539The embedding application can steer the search by calling
540\code{Py_SetProgramName(\var{file})}\ttindex{Py_SetProgramName()} \emph{before} calling
541\cfunction{Py_Initialize()}. Note that \envvar{PYTHONHOME} still
542overrides this and \envvar{PYTHONPATH} is still inserted in front of
543the standard path. An application that requires total control has to
544provide its own implementation of
545\cfunction{Py_GetPath()}\ttindex{Py_GetPath()},
546\cfunction{Py_GetPrefix()}\ttindex{Py_GetPrefix()},
547\cfunction{Py_GetExecPrefix()}\ttindex{Py_GetExecPrefix()}, and
548\cfunction{Py_GetProgramFullPath()}\ttindex{Py_GetProgramFullPath()} (all
549defined in \file{Modules/getpath.c}).
550
551Sometimes, it is desirable to ``uninitialize'' Python. For instance,
552the application may want to start over (make another call to
553\cfunction{Py_Initialize()}) or the application is simply done with its
554use of Python and wants to free all memory allocated by Python. This
555can be accomplished by calling \cfunction{Py_Finalize()}. The function
556\cfunction{Py_IsInitialized()}\ttindex{Py_IsInitialized()} returns
557true if Python is currently in the initialized state. More
558information about these functions is given in a later chapter.