blob: 65298eba788bf4e783f9a09959878bf78932ad20 [file] [log] [blame]
Fred Drake3adf79e2001-10-12 19:01:43 +00001\chapter{Introduction \label{intro}}
2
3
4The Application Programmer's Interface to Python gives C and
5\Cpp{} programmers access to the Python interpreter at a variety of
Fred Drakec37b65e2001-11-28 07:26:15 +00006levels. The API is equally usable from \Cpp, but for brevity it is
Fred Drake3adf79e2001-10-12 19:01:43 +00007generally referred to as the Python/C API. There are two
8fundamentally different reasons for using the Python/C API. The first
9reason is to write \emph{extension modules} for specific purposes;
10these are C modules that extend the Python interpreter. This is
11probably the most common use. The second reason is to use Python as a
12component in a larger application; this technique is generally
13referred to as \dfn{embedding} Python in an application.
14
15Writing an extension module is a relatively well-understood process,
16where a ``cookbook'' approach works well. There are several tools
17that automate the process to some extent. While people have embedded
18Python in other applications since its early existence, the process of
19embedding Python is less straightforward than writing an extension.
20
21Many API functions are useful independent of whether you're embedding
22or extending Python; moreover, most applications that embed Python
23will need to provide a custom extension as well, so it's probably a
24good idea to become familiar with writing an extension before
25attempting to embed Python in a real application.
26
27
28\section{Include Files \label{includes}}
29
30All function, type and macro definitions needed to use the Python/C
31API are included in your code by the following line:
32
33\begin{verbatim}
34#include "Python.h"
35\end{verbatim}
36
37This implies inclusion of the following standard headers:
38\code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>},
39\code{<limits.h>}, and \code{<stdlib.h>} (if available).
40Since Python may define some pre-processor definitions which affect
41the standard headers on some systems, you must include \file{Python.h}
42before any standard headers are included.
43
44All user visible names defined by Python.h (except those defined by
45the included standard headers) have one of the prefixes \samp{Py} or
46\samp{_Py}. Names beginning with \samp{_Py} are for internal use by
47the Python implementation and should not be used by extension writers.
48Structure member names do not have a reserved prefix.
49
50\strong{Important:} user code should never define names that begin
51with \samp{Py} or \samp{_Py}. This confuses the reader, and
52jeopardizes the portability of the user code to future Python
53versions, which may define additional names beginning with one of
54these prefixes.
55
56The header files are typically installed with Python. On \UNIX, these
57are located in the directories
58\file{\envvar{prefix}/include/python\var{version}/} and
59\file{\envvar{exec_prefix}/include/python\var{version}/}, where
60\envvar{prefix} and \envvar{exec_prefix} are defined by the
61corresponding parameters to Python's \program{configure} script and
62\var{version} is \code{sys.version[:3]}. On Windows, the headers are
63installed in \file{\envvar{prefix}/include}, where \envvar{prefix} is
64the installation directory specified to the installer.
65
66To include the headers, place both directories (if different) on your
67compiler's search path for includes. Do \emph{not} place the parent
68directories on the search path and then use
69\samp{\#include <python\shortversion/Python.h>}; this will break on
70multi-platform builds since the platform independent headers under
71\envvar{prefix} include the platform specific headers from
72\envvar{exec_prefix}.
73
74\Cpp{} users should note that though the API is defined entirely using
75C, the header files do properly declare the entry points to be
76\code{extern "C"}, so there is no need to do anything special to use
77the API from \Cpp.
78
79
80\section{Objects, Types and Reference Counts \label{objects}}
81
82Most Python/C API functions have one or more arguments as well as a
83return value of type \ctype{PyObject*}. This type is a pointer
84to an opaque data type representing an arbitrary Python
85object. Since all Python object types are treated the same way by the
86Python language in most situations (e.g., assignments, scope rules,
87and argument passing), it is only fitting that they should be
88represented by a single C type. Almost all Python objects live on the
89heap: you never declare an automatic or static variable of type
90\ctype{PyObject}, only pointer variables of type \ctype{PyObject*} can
91be declared. The sole exception are the type objects\obindex{type};
92since these must never be deallocated, they are typically static
93\ctype{PyTypeObject} objects.
94
95All Python objects (even Python integers) have a \dfn{type} and a
96\dfn{reference count}. An object's type determines what kind of object
97it is (e.g., an integer, a list, or a user-defined function; there are
98many more as explained in the \citetitle[../ref/ref.html]{Python
99Reference Manual}). For each of the well-known types there is a macro
100to check whether an object is of that type; for instance,
101\samp{PyList_Check(\var{a})} is true if (and only if) the object
102pointed to by \var{a} is a Python list.
103
104
105\subsection{Reference Counts \label{refcounts}}
106
107The reference count is important because today's computers have a
108finite (and often severely limited) memory size; it counts how many
109different places there are that have a reference to an object. Such a
110place could be another object, or a global (or static) C variable, or
111a local variable in some C function. When an object's reference count
112becomes zero, the object is deallocated. If it contains references to
113other objects, their reference count is decremented. Those other
114objects may be deallocated in turn, if this decrement makes their
115reference count become zero, and so on. (There's an obvious problem
116with objects that reference each other here; for now, the solution is
117``don't do that.'')
118
119Reference counts are always manipulated explicitly. The normal way is
120to use the macro \cfunction{Py_INCREF()}\ttindex{Py_INCREF()} to
121increment an object's reference count by one, and
122\cfunction{Py_DECREF()}\ttindex{Py_DECREF()} to decrement it by
123one. The \cfunction{Py_DECREF()} macro is considerably more complex
124than the incref one, since it must check whether the reference count
125becomes zero and then cause the object's deallocator to be called.
126The deallocator is a function pointer contained in the object's type
127structure. The type-specific deallocator takes care of decrementing
128the reference counts for other objects contained in the object if this
129is a compound object type, such as a list, as well as performing any
130additional finalization that's needed. There's no chance that the
131reference count can overflow; at least as many bits are used to hold
132the reference count as there are distinct memory locations in virtual
133memory (assuming \code{sizeof(long) >= sizeof(char*)}). Thus, the
134reference count increment is a simple operation.
135
136It is not necessary to increment an object's reference count for every
137local variable that contains a pointer to an object. In theory, the
138object's reference count goes up by one when the variable is made to
139point to it and it goes down by one when the variable goes out of
140scope. However, these two cancel each other out, so at the end the
141reference count hasn't changed. The only real reason to use the
142reference count is to prevent the object from being deallocated as
143long as our variable is pointing to it. If we know that there is at
144least one other reference to the object that lives at least as long as
145our variable, there is no need to increment the reference count
146temporarily. An important situation where this arises is in objects
147that are passed as arguments to C functions in an extension module
148that are called from Python; the call mechanism guarantees to hold a
149reference to every argument for the duration of the call.
150
151However, a common pitfall is to extract an object from a list and
152hold on to it for a while without incrementing its reference count.
153Some other operation might conceivably remove the object from the
154list, decrementing its reference count and possible deallocating it.
155The real danger is that innocent-looking operations may invoke
156arbitrary Python code which could do this; there is a code path which
157allows control to flow back to the user from a \cfunction{Py_DECREF()},
158so almost any operation is potentially dangerous.
159
160A safe approach is to always use the generic operations (functions
161whose name begins with \samp{PyObject_}, \samp{PyNumber_},
162\samp{PySequence_} or \samp{PyMapping_}). These operations always
163increment the reference count of the object they return. This leaves
164the caller with the responsibility to call
165\cfunction{Py_DECREF()} when they are done with the result; this soon
166becomes second nature.
167
168
169\subsubsection{Reference Count Details \label{refcountDetails}}
170
171The reference count behavior of functions in the Python/C API is best
172explained in terms of \emph{ownership of references}. Note that we
173talk of owning references, never of owning objects; objects are always
174shared! When a function owns a reference, it has to dispose of it
175properly --- either by passing ownership on (usually to its caller) or
176by calling \cfunction{Py_DECREF()} or \cfunction{Py_XDECREF()}. When
177a function passes ownership of a reference on to its caller, the
178caller is said to receive a \emph{new} reference. When no ownership
179is transferred, the caller is said to \emph{borrow} the reference.
180Nothing needs to be done for a borrowed reference.
181
182Conversely, when a calling function passes it a reference to an
183object, there are two possibilities: the function \emph{steals} a
184reference to the object, or it does not. Few functions steal
185references; the two notable exceptions are
186\cfunction{PyList_SetItem()}\ttindex{PyList_SetItem()} and
187\cfunction{PyTuple_SetItem()}\ttindex{PyTuple_SetItem()}, which
188steal a reference to the item (but not to the tuple or list into which
189the item is put!). These functions were designed to steal a reference
190because of a common idiom for populating a tuple or list with newly
191created objects; for example, the code to create the tuple \code{(1,
1922, "three")} could look like this (forgetting about error handling for
193the moment; a better way to code this is shown below):
194
195\begin{verbatim}
196PyObject *t;
197
198t = PyTuple_New(3);
199PyTuple_SetItem(t, 0, PyInt_FromLong(1L));
200PyTuple_SetItem(t, 1, PyInt_FromLong(2L));
201PyTuple_SetItem(t, 2, PyString_FromString("three"));
202\end{verbatim}
203
204Incidentally, \cfunction{PyTuple_SetItem()} is the \emph{only} way to
205set tuple items; \cfunction{PySequence_SetItem()} and
206\cfunction{PyObject_SetItem()} refuse to do this since tuples are an
207immutable data type. You should only use
208\cfunction{PyTuple_SetItem()} for tuples that you are creating
209yourself.
210
211Equivalent code for populating a list can be written using
212\cfunction{PyList_New()} and \cfunction{PyList_SetItem()}. Such code
213can also use \cfunction{PySequence_SetItem()}; this illustrates the
214difference between the two (the extra \cfunction{Py_DECREF()} calls):
215
216\begin{verbatim}
217PyObject *l, *x;
218
219l = PyList_New(3);
220x = PyInt_FromLong(1L);
221PySequence_SetItem(l, 0, x); Py_DECREF(x);
222x = PyInt_FromLong(2L);
223PySequence_SetItem(l, 1, x); Py_DECREF(x);
224x = PyString_FromString("three");
225PySequence_SetItem(l, 2, x); Py_DECREF(x);
226\end{verbatim}
227
228You might find it strange that the ``recommended'' approach takes more
229code. However, in practice, you will rarely use these ways of
230creating and populating a tuple or list. There's a generic function,
231\cfunction{Py_BuildValue()}, that can create most common objects from
232C values, directed by a \dfn{format string}. For example, the
233above two blocks of code could be replaced by the following (which
234also takes care of the error checking):
235
236\begin{verbatim}
237PyObject *t, *l;
238
239t = Py_BuildValue("(iis)", 1, 2, "three");
240l = Py_BuildValue("[iis]", 1, 2, "three");
241\end{verbatim}
242
243It is much more common to use \cfunction{PyObject_SetItem()} and
244friends with items whose references you are only borrowing, like
245arguments that were passed in to the function you are writing. In
246that case, their behaviour regarding reference counts is much saner,
247since you don't have to increment a reference count so you can give a
248reference away (``have it be stolen''). For example, this function
249sets all items of a list (actually, any mutable sequence) to a given
250item:
251
252\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000253int
254set_all(PyObject *target, PyObject *item)
Fred Drake3adf79e2001-10-12 19:01:43 +0000255{
256 int i, n;
257
258 n = PyObject_Length(target);
259 if (n < 0)
260 return -1;
261 for (i = 0; i < n; i++) {
262 if (PyObject_SetItem(target, i, item) < 0)
263 return -1;
264 }
265 return 0;
266}
267\end{verbatim}
268\ttindex{set_all()}
269
270The situation is slightly different for function return values.
271While passing a reference to most functions does not change your
272ownership responsibilities for that reference, many functions that
273return a referece to an object give you ownership of the reference.
274The reason is simple: in many cases, the returned object is created
275on the fly, and the reference you get is the only reference to the
276object. Therefore, the generic functions that return object
277references, like \cfunction{PyObject_GetItem()} and
278\cfunction{PySequence_GetItem()}, always return a new reference (the
279caller becomes the owner of the reference).
280
281It is important to realize that whether you own a reference returned
282by a function depends on which function you call only --- \emph{the
283plumage} (the type of the type of the object passed as an
284argument to the function) \emph{doesn't enter into it!} Thus, if you
285extract an item from a list using \cfunction{PyList_GetItem()}, you
286don't own the reference --- but if you obtain the same item from the
287same list using \cfunction{PySequence_GetItem()} (which happens to
288take exactly the same arguments), you do own a reference to the
289returned object.
290
291Here is an example of how you could write a function that computes the
292sum of the items in a list of integers; once using
293\cfunction{PyList_GetItem()}\ttindex{PyList_GetItem()}, and once using
294\cfunction{PySequence_GetItem()}\ttindex{PySequence_GetItem()}.
295
296\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000297long
298sum_list(PyObject *list)
Fred Drake3adf79e2001-10-12 19:01:43 +0000299{
300 int i, n;
301 long total = 0;
302 PyObject *item;
303
304 n = PyList_Size(list);
305 if (n < 0)
306 return -1; /* Not a list */
307 for (i = 0; i < n; i++) {
308 item = PyList_GetItem(list, i); /* Can't fail */
309 if (!PyInt_Check(item)) continue; /* Skip non-integers */
310 total += PyInt_AsLong(item);
311 }
312 return total;
313}
314\end{verbatim}
315\ttindex{sum_list()}
316
317\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000318long
319sum_sequence(PyObject *sequence)
Fred Drake3adf79e2001-10-12 19:01:43 +0000320{
321 int i, n;
322 long total = 0;
323 PyObject *item;
324 n = PySequence_Length(sequence);
325 if (n < 0)
326 return -1; /* Has no length */
327 for (i = 0; i < n; i++) {
328 item = PySequence_GetItem(sequence, i);
329 if (item == NULL)
330 return -1; /* Not a sequence, or other failure */
331 if (PyInt_Check(item))
332 total += PyInt_AsLong(item);
333 Py_DECREF(item); /* Discard reference ownership */
334 }
335 return total;
336}
337\end{verbatim}
338\ttindex{sum_sequence()}
339
340
341\subsection{Types \label{types}}
342
343There are few other data types that play a significant role in
344the Python/C API; most are simple C types such as \ctype{int},
345\ctype{long}, \ctype{double} and \ctype{char*}. A few structure types
346are used to describe static tables used to list the functions exported
347by a module or the data attributes of a new object type, and another
348is used to describe the value of a complex number. These will
349be discussed together with the functions that use them.
350
351
352\section{Exceptions \label{exceptions}}
353
354The Python programmer only needs to deal with exceptions if specific
355error handling is required; unhandled exceptions are automatically
356propagated to the caller, then to the caller's caller, and so on, until
357they reach the top-level interpreter, where they are reported to the
358user accompanied by a stack traceback.
359
360For C programmers, however, error checking always has to be explicit.
361All functions in the Python/C API can raise exceptions, unless an
362explicit claim is made otherwise in a function's documentation. In
363general, when a function encounters an error, it sets an exception,
364discards any object references that it owns, and returns an
365error indicator --- usually \NULL{} or \code{-1}. A few functions
366return a Boolean true/false result, with false indicating an error.
367Very few functions return no explicit error indicator or have an
368ambiguous return value, and require explicit testing for errors with
369\cfunction{PyErr_Occurred()}\ttindex{PyErr_Occurred()}.
370
371Exception state is maintained in per-thread storage (this is
372equivalent to using global storage in an unthreaded application). A
373thread can be in one of two states: an exception has occurred, or not.
374The function \cfunction{PyErr_Occurred()} can be used to check for
375this: it returns a borrowed reference to the exception type object
376when an exception has occurred, and \NULL{} otherwise. There are a
377number of functions to set the exception state:
378\cfunction{PyErr_SetString()}\ttindex{PyErr_SetString()} is the most
379common (though not the most general) function to set the exception
380state, and \cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} clears the
381exception state.
382
383The full exception state consists of three objects (all of which can
384be \NULL): the exception type, the corresponding exception
385value, and the traceback. These have the same meanings as the Python
386\withsubitem{(in module sys)}{
387 \ttindex{exc_type}\ttindex{exc_value}\ttindex{exc_traceback}}
388objects \code{sys.exc_type}, \code{sys.exc_value}, and
389\code{sys.exc_traceback}; however, they are not the same: the Python
390objects represent the last exception being handled by a Python
391\keyword{try} \ldots\ \keyword{except} statement, while the C level
392exception state only exists while an exception is being passed on
393between C functions until it reaches the Python bytecode interpreter's
394main loop, which takes care of transferring it to \code{sys.exc_type}
395and friends.
396
397Note that starting with Python 1.5, the preferred, thread-safe way to
398access the exception state from Python code is to call the function
399\withsubitem{(in module sys)}{\ttindex{exc_info()}}
400\function{sys.exc_info()}, which returns the per-thread exception state
401for Python code. Also, the semantics of both ways to access the
402exception state have changed so that a function which catches an
403exception will save and restore its thread's exception state so as to
404preserve the exception state of its caller. This prevents common bugs
405in exception handling code caused by an innocent-looking function
406overwriting the exception being handled; it also reduces the often
407unwanted lifetime extension for objects that are referenced by the
408stack frames in the traceback.
409
410As a general principle, a function that calls another function to
411perform some task should check whether the called function raised an
412exception, and if so, pass the exception state on to its caller. It
413should discard any object references that it owns, and return an
414error indicator, but it should \emph{not} set another exception ---
415that would overwrite the exception that was just raised, and lose
416important information about the exact cause of the error.
417
418A simple example of detecting exceptions and passing them on is shown
419in the \cfunction{sum_sequence()}\ttindex{sum_sequence()} example
420above. It so happens that that example doesn't need to clean up any
421owned references when it detects an error. The following example
422function shows some error cleanup. First, to remind you why you like
423Python, we show the equivalent Python code:
424
425\begin{verbatim}
426def incr_item(dict, key):
427 try:
428 item = dict[key]
429 except KeyError:
430 item = 0
431 dict[key] = item + 1
432\end{verbatim}
433\ttindex{incr_item()}
434
435Here is the corresponding C code, in all its glory:
436
437\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000438int
439incr_item(PyObject *dict, PyObject *key)
Fred Drake3adf79e2001-10-12 19:01:43 +0000440{
441 /* Objects all initialized to NULL for Py_XDECREF */
442 PyObject *item = NULL, *const_one = NULL, *incremented_item = NULL;
443 int rv = -1; /* Return value initialized to -1 (failure) */
444
445 item = PyObject_GetItem(dict, key);
446 if (item == NULL) {
447 /* Handle KeyError only: */
448 if (!PyErr_ExceptionMatches(PyExc_KeyError))
449 goto error;
450
451 /* Clear the error and use zero: */
452 PyErr_Clear();
453 item = PyInt_FromLong(0L);
454 if (item == NULL)
455 goto error;
456 }
457 const_one = PyInt_FromLong(1L);
458 if (const_one == NULL)
459 goto error;
460
461 incremented_item = PyNumber_Add(item, const_one);
462 if (incremented_item == NULL)
463 goto error;
464
465 if (PyObject_SetItem(dict, key, incremented_item) < 0)
466 goto error;
467 rv = 0; /* Success */
468 /* Continue with cleanup code */
469
470 error:
471 /* Cleanup code, shared by success and failure path */
472
473 /* Use Py_XDECREF() to ignore NULL references */
474 Py_XDECREF(item);
475 Py_XDECREF(const_one);
476 Py_XDECREF(incremented_item);
477
478 return rv; /* -1 for error, 0 for success */
479}
480\end{verbatim}
481\ttindex{incr_item()}
482
483This example represents an endorsed use of the \keyword{goto} statement
484in C! It illustrates the use of
485\cfunction{PyErr_ExceptionMatches()}\ttindex{PyErr_ExceptionMatches()} and
486\cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} to
487handle specific exceptions, and the use of
488\cfunction{Py_XDECREF()}\ttindex{Py_XDECREF()} to
489dispose of owned references that may be \NULL{} (note the
490\character{X} in the name; \cfunction{Py_DECREF()} would crash when
491confronted with a \NULL{} reference). It is important that the
492variables used to hold owned references are initialized to \NULL{} for
493this to work; likewise, the proposed return value is initialized to
494\code{-1} (failure) and only set to success after the final call made
495is successful.
496
497
498\section{Embedding Python \label{embedding}}
499
500The one important task that only embedders (as opposed to extension
501writers) of the Python interpreter have to worry about is the
502initialization, and possibly the finalization, of the Python
503interpreter. Most functionality of the interpreter can only be used
504after the interpreter has been initialized.
505
506The basic initialization function is
507\cfunction{Py_Initialize()}\ttindex{Py_Initialize()}.
508This initializes the table of loaded modules, and creates the
509fundamental modules \module{__builtin__}\refbimodindex{__builtin__},
510\module{__main__}\refbimodindex{__main__}, \module{sys}\refbimodindex{sys},
511and \module{exceptions}.\refbimodindex{exceptions} It also initializes
512the module search path (\code{sys.path}).%
513\indexiii{module}{search}{path}
514\withsubitem{(in module sys)}{\ttindex{path}}
515
516\cfunction{Py_Initialize()} does not set the ``script argument list''
517(\code{sys.argv}). If this variable is needed by Python code that
518will be executed later, it must be set explicitly with a call to
519\code{PySys_SetArgv(\var{argc},
520\var{argv})}\ttindex{PySys_SetArgv()} subsequent to the call to
521\cfunction{Py_Initialize()}.
522
523On most systems (in particular, on \UNIX{} and Windows, although the
524details are slightly different),
525\cfunction{Py_Initialize()} calculates the module search path based
526upon its best guess for the location of the standard Python
527interpreter executable, assuming that the Python library is found in a
528fixed location relative to the Python interpreter executable. In
529particular, it looks for a directory named
530\file{lib/python\shortversion} relative to the parent directory where
531the executable named \file{python} is found on the shell command
532search path (the environment variable \envvar{PATH}).
533
534For instance, if the Python executable is found in
535\file{/usr/local/bin/python}, it will assume that the libraries are in
536\file{/usr/local/lib/python\shortversion}. (In fact, this particular path
537is also the ``fallback'' location, used when no executable file named
538\file{python} is found along \envvar{PATH}.) The user can override
539this behavior by setting the environment variable \envvar{PYTHONHOME},
540or insert additional directories in front of the standard path by
541setting \envvar{PYTHONPATH}.
542
543The embedding application can steer the search by calling
544\code{Py_SetProgramName(\var{file})}\ttindex{Py_SetProgramName()} \emph{before} calling
545\cfunction{Py_Initialize()}. Note that \envvar{PYTHONHOME} still
546overrides this and \envvar{PYTHONPATH} is still inserted in front of
547the standard path. An application that requires total control has to
548provide its own implementation of
549\cfunction{Py_GetPath()}\ttindex{Py_GetPath()},
550\cfunction{Py_GetPrefix()}\ttindex{Py_GetPrefix()},
551\cfunction{Py_GetExecPrefix()}\ttindex{Py_GetExecPrefix()}, and
552\cfunction{Py_GetProgramFullPath()}\ttindex{Py_GetProgramFullPath()} (all
553defined in \file{Modules/getpath.c}).
554
555Sometimes, it is desirable to ``uninitialize'' Python. For instance,
556the application may want to start over (make another call to
557\cfunction{Py_Initialize()}) or the application is simply done with its
558use of Python and wants to free all memory allocated by Python. This
559can be accomplished by calling \cfunction{Py_Finalize()}. The function
560\cfunction{Py_IsInitialized()}\ttindex{Py_IsInitialized()} returns
561true if Python is currently in the initialized state. More
562information about these functions is given in a later chapter.