blob: 364487d8c2c8d60030c8b48000db7609e8af6e0e [file] [log] [blame]
Fred Drake3adf79e2001-10-12 19:01:43 +00001\chapter{Introduction \label{intro}}
2
3
4The Application Programmer's Interface to Python gives C and
5\Cpp{} programmers access to the Python interpreter at a variety of
Fred Drakec37b65e2001-11-28 07:26:15 +00006levels. The API is equally usable from \Cpp, but for brevity it is
Fred Drake3adf79e2001-10-12 19:01:43 +00007generally referred to as the Python/C API. There are two
8fundamentally different reasons for using the Python/C API. The first
9reason is to write \emph{extension modules} for specific purposes;
10these are C modules that extend the Python interpreter. This is
11probably the most common use. The second reason is to use Python as a
12component in a larger application; this technique is generally
13referred to as \dfn{embedding} Python in an application.
14
15Writing an extension module is a relatively well-understood process,
16where a ``cookbook'' approach works well. There are several tools
17that automate the process to some extent. While people have embedded
18Python in other applications since its early existence, the process of
19embedding Python is less straightforward than writing an extension.
20
21Many API functions are useful independent of whether you're embedding
22or extending Python; moreover, most applications that embed Python
23will need to provide a custom extension as well, so it's probably a
24good idea to become familiar with writing an extension before
25attempting to embed Python in a real application.
26
27
28\section{Include Files \label{includes}}
29
30All function, type and macro definitions needed to use the Python/C
31API are included in your code by the following line:
32
33\begin{verbatim}
34#include "Python.h"
35\end{verbatim}
36
37This implies inclusion of the following standard headers:
38\code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>},
39\code{<limits.h>}, and \code{<stdlib.h>} (if available).
40Since Python may define some pre-processor definitions which affect
41the standard headers on some systems, you must include \file{Python.h}
42before any standard headers are included.
43
44All user visible names defined by Python.h (except those defined by
45the included standard headers) have one of the prefixes \samp{Py} or
46\samp{_Py}. Names beginning with \samp{_Py} are for internal use by
47the Python implementation and should not be used by extension writers.
48Structure member names do not have a reserved prefix.
49
50\strong{Important:} user code should never define names that begin
51with \samp{Py} or \samp{_Py}. This confuses the reader, and
52jeopardizes the portability of the user code to future Python
53versions, which may define additional names beginning with one of
54these prefixes.
55
56The header files are typically installed with Python. On \UNIX, these
57are located in the directories
58\file{\envvar{prefix}/include/python\var{version}/} and
59\file{\envvar{exec_prefix}/include/python\var{version}/}, where
60\envvar{prefix} and \envvar{exec_prefix} are defined by the
61corresponding parameters to Python's \program{configure} script and
62\var{version} is \code{sys.version[:3]}. On Windows, the headers are
63installed in \file{\envvar{prefix}/include}, where \envvar{prefix} is
64the installation directory specified to the installer.
65
66To include the headers, place both directories (if different) on your
67compiler's search path for includes. Do \emph{not} place the parent
68directories on the search path and then use
69\samp{\#include <python\shortversion/Python.h>}; this will break on
70multi-platform builds since the platform independent headers under
71\envvar{prefix} include the platform specific headers from
72\envvar{exec_prefix}.
73
74\Cpp{} users should note that though the API is defined entirely using
75C, the header files do properly declare the entry points to be
76\code{extern "C"}, so there is no need to do anything special to use
77the API from \Cpp.
78
79
80\section{Objects, Types and Reference Counts \label{objects}}
81
82Most Python/C API functions have one or more arguments as well as a
83return value of type \ctype{PyObject*}. This type is a pointer
84to an opaque data type representing an arbitrary Python
85object. Since all Python object types are treated the same way by the
86Python language in most situations (e.g., assignments, scope rules,
87and argument passing), it is only fitting that they should be
88represented by a single C type. Almost all Python objects live on the
89heap: you never declare an automatic or static variable of type
90\ctype{PyObject}, only pointer variables of type \ctype{PyObject*} can
91be declared. The sole exception are the type objects\obindex{type};
92since these must never be deallocated, they are typically static
93\ctype{PyTypeObject} objects.
94
95All Python objects (even Python integers) have a \dfn{type} and a
96\dfn{reference count}. An object's type determines what kind of object
97it is (e.g., an integer, a list, or a user-defined function; there are
98many more as explained in the \citetitle[../ref/ref.html]{Python
99Reference Manual}). For each of the well-known types there is a macro
100to check whether an object is of that type; for instance,
101\samp{PyList_Check(\var{a})} is true if (and only if) the object
102pointed to by \var{a} is a Python list.
103
104
105\subsection{Reference Counts \label{refcounts}}
106
107The reference count is important because today's computers have a
108finite (and often severely limited) memory size; it counts how many
109different places there are that have a reference to an object. Such a
110place could be another object, or a global (or static) C variable, or
111a local variable in some C function. When an object's reference count
112becomes zero, the object is deallocated. If it contains references to
113other objects, their reference count is decremented. Those other
114objects may be deallocated in turn, if this decrement makes their
115reference count become zero, and so on. (There's an obvious problem
116with objects that reference each other here; for now, the solution is
117``don't do that.'')
118
119Reference counts are always manipulated explicitly. The normal way is
120to use the macro \cfunction{Py_INCREF()}\ttindex{Py_INCREF()} to
121increment an object's reference count by one, and
122\cfunction{Py_DECREF()}\ttindex{Py_DECREF()} to decrement it by
123one. The \cfunction{Py_DECREF()} macro is considerably more complex
124than the incref one, since it must check whether the reference count
125becomes zero and then cause the object's deallocator to be called.
126The deallocator is a function pointer contained in the object's type
127structure. The type-specific deallocator takes care of decrementing
128the reference counts for other objects contained in the object if this
129is a compound object type, such as a list, as well as performing any
130additional finalization that's needed. There's no chance that the
131reference count can overflow; at least as many bits are used to hold
132the reference count as there are distinct memory locations in virtual
133memory (assuming \code{sizeof(long) >= sizeof(char*)}). Thus, the
134reference count increment is a simple operation.
135
136It is not necessary to increment an object's reference count for every
137local variable that contains a pointer to an object. In theory, the
138object's reference count goes up by one when the variable is made to
139point to it and it goes down by one when the variable goes out of
140scope. However, these two cancel each other out, so at the end the
141reference count hasn't changed. The only real reason to use the
142reference count is to prevent the object from being deallocated as
143long as our variable is pointing to it. If we know that there is at
144least one other reference to the object that lives at least as long as
145our variable, there is no need to increment the reference count
146temporarily. An important situation where this arises is in objects
147that are passed as arguments to C functions in an extension module
148that are called from Python; the call mechanism guarantees to hold a
149reference to every argument for the duration of the call.
150
151However, a common pitfall is to extract an object from a list and
152hold on to it for a while without incrementing its reference count.
153Some other operation might conceivably remove the object from the
154list, decrementing its reference count and possible deallocating it.
155The real danger is that innocent-looking operations may invoke
156arbitrary Python code which could do this; there is a code path which
157allows control to flow back to the user from a \cfunction{Py_DECREF()},
158so almost any operation is potentially dangerous.
159
160A safe approach is to always use the generic operations (functions
161whose name begins with \samp{PyObject_}, \samp{PyNumber_},
162\samp{PySequence_} or \samp{PyMapping_}). These operations always
163increment the reference count of the object they return. This leaves
164the caller with the responsibility to call
165\cfunction{Py_DECREF()} when they are done with the result; this soon
166becomes second nature.
167
168
169\subsubsection{Reference Count Details \label{refcountDetails}}
170
171The reference count behavior of functions in the Python/C API is best
Martin v. Löwis5ce2fec2003-11-06 21:08:11 +0000172explained in terms of \emph{ownership of references}. Ownership
173pertains to references, never to objects (objects are not owned: they
174are always shared). "Owning a reference" means being responsible for
175calling Py_DECREF on it when the reference is no longer needed.
176Ownership can also be transferred, meaning that the code that receives
177ownership of the reference then becomes responsible for eventually
178decref'ing it by calling \cfunction{Py_DECREF()} or
179\cfunction{Py_XDECREF()} when it's no longer needed --or passing on
180this responsibility (usually to its caller).
181When a function passes ownership of a reference on to its caller, the
Fred Drake3adf79e2001-10-12 19:01:43 +0000182caller is said to receive a \emph{new} reference. When no ownership
183is transferred, the caller is said to \emph{borrow} the reference.
184Nothing needs to be done for a borrowed reference.
185
186Conversely, when a calling function passes it a reference to an
187object, there are two possibilities: the function \emph{steals} a
188reference to the object, or it does not. Few functions steal
189references; the two notable exceptions are
190\cfunction{PyList_SetItem()}\ttindex{PyList_SetItem()} and
191\cfunction{PyTuple_SetItem()}\ttindex{PyTuple_SetItem()}, which
192steal a reference to the item (but not to the tuple or list into which
193the item is put!). These functions were designed to steal a reference
194because of a common idiom for populating a tuple or list with newly
195created objects; for example, the code to create the tuple \code{(1,
1962, "three")} could look like this (forgetting about error handling for
197the moment; a better way to code this is shown below):
198
199\begin{verbatim}
200PyObject *t;
201
202t = PyTuple_New(3);
203PyTuple_SetItem(t, 0, PyInt_FromLong(1L));
204PyTuple_SetItem(t, 1, PyInt_FromLong(2L));
205PyTuple_SetItem(t, 2, PyString_FromString("three"));
206\end{verbatim}
207
208Incidentally, \cfunction{PyTuple_SetItem()} is the \emph{only} way to
209set tuple items; \cfunction{PySequence_SetItem()} and
210\cfunction{PyObject_SetItem()} refuse to do this since tuples are an
211immutable data type. You should only use
212\cfunction{PyTuple_SetItem()} for tuples that you are creating
213yourself.
214
215Equivalent code for populating a list can be written using
216\cfunction{PyList_New()} and \cfunction{PyList_SetItem()}. Such code
217can also use \cfunction{PySequence_SetItem()}; this illustrates the
218difference between the two (the extra \cfunction{Py_DECREF()} calls):
219
220\begin{verbatim}
221PyObject *l, *x;
222
223l = PyList_New(3);
224x = PyInt_FromLong(1L);
225PySequence_SetItem(l, 0, x); Py_DECREF(x);
226x = PyInt_FromLong(2L);
227PySequence_SetItem(l, 1, x); Py_DECREF(x);
228x = PyString_FromString("three");
229PySequence_SetItem(l, 2, x); Py_DECREF(x);
230\end{verbatim}
231
232You might find it strange that the ``recommended'' approach takes more
233code. However, in practice, you will rarely use these ways of
234creating and populating a tuple or list. There's a generic function,
235\cfunction{Py_BuildValue()}, that can create most common objects from
236C values, directed by a \dfn{format string}. For example, the
237above two blocks of code could be replaced by the following (which
238also takes care of the error checking):
239
240\begin{verbatim}
241PyObject *t, *l;
242
243t = Py_BuildValue("(iis)", 1, 2, "three");
244l = Py_BuildValue("[iis]", 1, 2, "three");
245\end{verbatim}
246
247It is much more common to use \cfunction{PyObject_SetItem()} and
248friends with items whose references you are only borrowing, like
249arguments that were passed in to the function you are writing. In
250that case, their behaviour regarding reference counts is much saner,
251since you don't have to increment a reference count so you can give a
252reference away (``have it be stolen''). For example, this function
253sets all items of a list (actually, any mutable sequence) to a given
254item:
255
256\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000257int
258set_all(PyObject *target, PyObject *item)
Fred Drake3adf79e2001-10-12 19:01:43 +0000259{
260 int i, n;
261
262 n = PyObject_Length(target);
263 if (n < 0)
264 return -1;
265 for (i = 0; i < n; i++) {
266 if (PyObject_SetItem(target, i, item) < 0)
267 return -1;
268 }
269 return 0;
270}
271\end{verbatim}
272\ttindex{set_all()}
273
274The situation is slightly different for function return values.
275While passing a reference to most functions does not change your
276ownership responsibilities for that reference, many functions that
277return a referece to an object give you ownership of the reference.
278The reason is simple: in many cases, the returned object is created
279on the fly, and the reference you get is the only reference to the
280object. Therefore, the generic functions that return object
281references, like \cfunction{PyObject_GetItem()} and
282\cfunction{PySequence_GetItem()}, always return a new reference (the
283caller becomes the owner of the reference).
284
285It is important to realize that whether you own a reference returned
286by a function depends on which function you call only --- \emph{the
Neal Norwitz7decf5e2003-10-13 17:47:30 +0000287plumage} (the type of the object passed as an
Fred Drake3adf79e2001-10-12 19:01:43 +0000288argument to the function) \emph{doesn't enter into it!} Thus, if you
289extract an item from a list using \cfunction{PyList_GetItem()}, you
290don't own the reference --- but if you obtain the same item from the
291same list using \cfunction{PySequence_GetItem()} (which happens to
292take exactly the same arguments), you do own a reference to the
293returned object.
294
295Here is an example of how you could write a function that computes the
296sum of the items in a list of integers; once using
297\cfunction{PyList_GetItem()}\ttindex{PyList_GetItem()}, and once using
298\cfunction{PySequence_GetItem()}\ttindex{PySequence_GetItem()}.
299
300\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000301long
302sum_list(PyObject *list)
Fred Drake3adf79e2001-10-12 19:01:43 +0000303{
304 int i, n;
305 long total = 0;
306 PyObject *item;
307
308 n = PyList_Size(list);
309 if (n < 0)
310 return -1; /* Not a list */
311 for (i = 0; i < n; i++) {
312 item = PyList_GetItem(list, i); /* Can't fail */
313 if (!PyInt_Check(item)) continue; /* Skip non-integers */
314 total += PyInt_AsLong(item);
315 }
316 return total;
317}
318\end{verbatim}
319\ttindex{sum_list()}
320
321\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000322long
323sum_sequence(PyObject *sequence)
Fred Drake3adf79e2001-10-12 19:01:43 +0000324{
325 int i, n;
326 long total = 0;
327 PyObject *item;
328 n = PySequence_Length(sequence);
329 if (n < 0)
330 return -1; /* Has no length */
331 for (i = 0; i < n; i++) {
332 item = PySequence_GetItem(sequence, i);
333 if (item == NULL)
334 return -1; /* Not a sequence, or other failure */
335 if (PyInt_Check(item))
336 total += PyInt_AsLong(item);
337 Py_DECREF(item); /* Discard reference ownership */
338 }
339 return total;
340}
341\end{verbatim}
342\ttindex{sum_sequence()}
343
344
345\subsection{Types \label{types}}
346
347There are few other data types that play a significant role in
348the Python/C API; most are simple C types such as \ctype{int},
349\ctype{long}, \ctype{double} and \ctype{char*}. A few structure types
350are used to describe static tables used to list the functions exported
351by a module or the data attributes of a new object type, and another
352is used to describe the value of a complex number. These will
353be discussed together with the functions that use them.
354
355
356\section{Exceptions \label{exceptions}}
357
358The Python programmer only needs to deal with exceptions if specific
359error handling is required; unhandled exceptions are automatically
360propagated to the caller, then to the caller's caller, and so on, until
361they reach the top-level interpreter, where they are reported to the
362user accompanied by a stack traceback.
363
364For C programmers, however, error checking always has to be explicit.
365All functions in the Python/C API can raise exceptions, unless an
366explicit claim is made otherwise in a function's documentation. In
367general, when a function encounters an error, it sets an exception,
368discards any object references that it owns, and returns an
369error indicator --- usually \NULL{} or \code{-1}. A few functions
370return a Boolean true/false result, with false indicating an error.
371Very few functions return no explicit error indicator or have an
372ambiguous return value, and require explicit testing for errors with
373\cfunction{PyErr_Occurred()}\ttindex{PyErr_Occurred()}.
374
375Exception state is maintained in per-thread storage (this is
376equivalent to using global storage in an unthreaded application). A
377thread can be in one of two states: an exception has occurred, or not.
378The function \cfunction{PyErr_Occurred()} can be used to check for
379this: it returns a borrowed reference to the exception type object
380when an exception has occurred, and \NULL{} otherwise. There are a
381number of functions to set the exception state:
382\cfunction{PyErr_SetString()}\ttindex{PyErr_SetString()} is the most
383common (though not the most general) function to set the exception
384state, and \cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} clears the
385exception state.
386
387The full exception state consists of three objects (all of which can
388be \NULL): the exception type, the corresponding exception
389value, and the traceback. These have the same meanings as the Python
390\withsubitem{(in module sys)}{
391 \ttindex{exc_type}\ttindex{exc_value}\ttindex{exc_traceback}}
392objects \code{sys.exc_type}, \code{sys.exc_value}, and
393\code{sys.exc_traceback}; however, they are not the same: the Python
394objects represent the last exception being handled by a Python
395\keyword{try} \ldots\ \keyword{except} statement, while the C level
396exception state only exists while an exception is being passed on
397between C functions until it reaches the Python bytecode interpreter's
398main loop, which takes care of transferring it to \code{sys.exc_type}
399and friends.
400
401Note that starting with Python 1.5, the preferred, thread-safe way to
402access the exception state from Python code is to call the function
403\withsubitem{(in module sys)}{\ttindex{exc_info()}}
404\function{sys.exc_info()}, which returns the per-thread exception state
405for Python code. Also, the semantics of both ways to access the
406exception state have changed so that a function which catches an
407exception will save and restore its thread's exception state so as to
408preserve the exception state of its caller. This prevents common bugs
409in exception handling code caused by an innocent-looking function
410overwriting the exception being handled; it also reduces the often
411unwanted lifetime extension for objects that are referenced by the
412stack frames in the traceback.
413
414As a general principle, a function that calls another function to
415perform some task should check whether the called function raised an
416exception, and if so, pass the exception state on to its caller. It
417should discard any object references that it owns, and return an
418error indicator, but it should \emph{not} set another exception ---
419that would overwrite the exception that was just raised, and lose
420important information about the exact cause of the error.
421
422A simple example of detecting exceptions and passing them on is shown
423in the \cfunction{sum_sequence()}\ttindex{sum_sequence()} example
424above. It so happens that that example doesn't need to clean up any
425owned references when it detects an error. The following example
426function shows some error cleanup. First, to remind you why you like
427Python, we show the equivalent Python code:
428
429\begin{verbatim}
430def incr_item(dict, key):
431 try:
432 item = dict[key]
433 except KeyError:
434 item = 0
435 dict[key] = item + 1
436\end{verbatim}
437\ttindex{incr_item()}
438
439Here is the corresponding C code, in all its glory:
440
441\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000442int
443incr_item(PyObject *dict, PyObject *key)
Fred Drake3adf79e2001-10-12 19:01:43 +0000444{
445 /* Objects all initialized to NULL for Py_XDECREF */
446 PyObject *item = NULL, *const_one = NULL, *incremented_item = NULL;
447 int rv = -1; /* Return value initialized to -1 (failure) */
448
449 item = PyObject_GetItem(dict, key);
450 if (item == NULL) {
451 /* Handle KeyError only: */
452 if (!PyErr_ExceptionMatches(PyExc_KeyError))
453 goto error;
454
455 /* Clear the error and use zero: */
456 PyErr_Clear();
457 item = PyInt_FromLong(0L);
458 if (item == NULL)
459 goto error;
460 }
461 const_one = PyInt_FromLong(1L);
462 if (const_one == NULL)
463 goto error;
464
465 incremented_item = PyNumber_Add(item, const_one);
466 if (incremented_item == NULL)
467 goto error;
468
469 if (PyObject_SetItem(dict, key, incremented_item) < 0)
470 goto error;
471 rv = 0; /* Success */
472 /* Continue with cleanup code */
473
474 error:
475 /* Cleanup code, shared by success and failure path */
476
477 /* Use Py_XDECREF() to ignore NULL references */
478 Py_XDECREF(item);
479 Py_XDECREF(const_one);
480 Py_XDECREF(incremented_item);
481
482 return rv; /* -1 for error, 0 for success */
483}
484\end{verbatim}
485\ttindex{incr_item()}
486
487This example represents an endorsed use of the \keyword{goto} statement
488in C! It illustrates the use of
489\cfunction{PyErr_ExceptionMatches()}\ttindex{PyErr_ExceptionMatches()} and
490\cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} to
491handle specific exceptions, and the use of
492\cfunction{Py_XDECREF()}\ttindex{Py_XDECREF()} to
493dispose of owned references that may be \NULL{} (note the
494\character{X} in the name; \cfunction{Py_DECREF()} would crash when
495confronted with a \NULL{} reference). It is important that the
496variables used to hold owned references are initialized to \NULL{} for
497this to work; likewise, the proposed return value is initialized to
498\code{-1} (failure) and only set to success after the final call made
499is successful.
500
501
502\section{Embedding Python \label{embedding}}
503
504The one important task that only embedders (as opposed to extension
505writers) of the Python interpreter have to worry about is the
506initialization, and possibly the finalization, of the Python
507interpreter. Most functionality of the interpreter can only be used
508after the interpreter has been initialized.
509
510The basic initialization function is
511\cfunction{Py_Initialize()}\ttindex{Py_Initialize()}.
512This initializes the table of loaded modules, and creates the
513fundamental modules \module{__builtin__}\refbimodindex{__builtin__},
514\module{__main__}\refbimodindex{__main__}, \module{sys}\refbimodindex{sys},
515and \module{exceptions}.\refbimodindex{exceptions} It also initializes
516the module search path (\code{sys.path}).%
517\indexiii{module}{search}{path}
518\withsubitem{(in module sys)}{\ttindex{path}}
519
520\cfunction{Py_Initialize()} does not set the ``script argument list''
521(\code{sys.argv}). If this variable is needed by Python code that
522will be executed later, it must be set explicitly with a call to
523\code{PySys_SetArgv(\var{argc},
524\var{argv})}\ttindex{PySys_SetArgv()} subsequent to the call to
525\cfunction{Py_Initialize()}.
526
527On most systems (in particular, on \UNIX{} and Windows, although the
528details are slightly different),
529\cfunction{Py_Initialize()} calculates the module search path based
530upon its best guess for the location of the standard Python
531interpreter executable, assuming that the Python library is found in a
532fixed location relative to the Python interpreter executable. In
533particular, it looks for a directory named
534\file{lib/python\shortversion} relative to the parent directory where
535the executable named \file{python} is found on the shell command
536search path (the environment variable \envvar{PATH}).
537
538For instance, if the Python executable is found in
539\file{/usr/local/bin/python}, it will assume that the libraries are in
540\file{/usr/local/lib/python\shortversion}. (In fact, this particular path
541is also the ``fallback'' location, used when no executable file named
542\file{python} is found along \envvar{PATH}.) The user can override
543this behavior by setting the environment variable \envvar{PYTHONHOME},
544or insert additional directories in front of the standard path by
545setting \envvar{PYTHONPATH}.
546
547The embedding application can steer the search by calling
548\code{Py_SetProgramName(\var{file})}\ttindex{Py_SetProgramName()} \emph{before} calling
549\cfunction{Py_Initialize()}. Note that \envvar{PYTHONHOME} still
550overrides this and \envvar{PYTHONPATH} is still inserted in front of
551the standard path. An application that requires total control has to
552provide its own implementation of
553\cfunction{Py_GetPath()}\ttindex{Py_GetPath()},
554\cfunction{Py_GetPrefix()}\ttindex{Py_GetPrefix()},
555\cfunction{Py_GetExecPrefix()}\ttindex{Py_GetExecPrefix()}, and
556\cfunction{Py_GetProgramFullPath()}\ttindex{Py_GetProgramFullPath()} (all
557defined in \file{Modules/getpath.c}).
558
559Sometimes, it is desirable to ``uninitialize'' Python. For instance,
560the application may want to start over (make another call to
561\cfunction{Py_Initialize()}) or the application is simply done with its
562use of Python and wants to free all memory allocated by Python. This
563can be accomplished by calling \cfunction{Py_Finalize()}. The function
564\cfunction{Py_IsInitialized()}\ttindex{Py_IsInitialized()} returns
565true if Python is currently in the initialized state. More
566information about these functions is given in a later chapter.