blob: d27a11608369519d4e97dab60411861726df8d9e [file] [log] [blame]
Fred Drake3adf79e2001-10-12 19:01:43 +00001\chapter{Introduction \label{intro}}
2
3
4The Application Programmer's Interface to Python gives C and
5\Cpp{} programmers access to the Python interpreter at a variety of
Fred Drakec37b65e2001-11-28 07:26:15 +00006levels. The API is equally usable from \Cpp, but for brevity it is
Fred Drake3adf79e2001-10-12 19:01:43 +00007generally referred to as the Python/C API. There are two
8fundamentally different reasons for using the Python/C API. The first
9reason is to write \emph{extension modules} for specific purposes;
10these are C modules that extend the Python interpreter. This is
11probably the most common use. The second reason is to use Python as a
12component in a larger application; this technique is generally
13referred to as \dfn{embedding} Python in an application.
14
15Writing an extension module is a relatively well-understood process,
16where a ``cookbook'' approach works well. There are several tools
17that automate the process to some extent. While people have embedded
18Python in other applications since its early existence, the process of
19embedding Python is less straightforward than writing an extension.
20
21Many API functions are useful independent of whether you're embedding
22or extending Python; moreover, most applications that embed Python
23will need to provide a custom extension as well, so it's probably a
24good idea to become familiar with writing an extension before
25attempting to embed Python in a real application.
26
27
28\section{Include Files \label{includes}}
29
30All function, type and macro definitions needed to use the Python/C
31API are included in your code by the following line:
32
33\begin{verbatim}
34#include "Python.h"
35\end{verbatim}
36
37This implies inclusion of the following standard headers:
38\code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>},
39\code{<limits.h>}, and \code{<stdlib.h>} (if available).
Fred Drake34c43202004-03-31 07:45:46 +000040
41\begin{notice}[warning]
42 Since Python may define some pre-processor definitions which affect
43 the standard headers on some systems, you \emph{must} include
44 \file{Python.h} before any standard headers are included.
45\end{notice}
Fred Drake3adf79e2001-10-12 19:01:43 +000046
47All user visible names defined by Python.h (except those defined by
48the included standard headers) have one of the prefixes \samp{Py} or
49\samp{_Py}. Names beginning with \samp{_Py} are for internal use by
50the Python implementation and should not be used by extension writers.
51Structure member names do not have a reserved prefix.
52
53\strong{Important:} user code should never define names that begin
54with \samp{Py} or \samp{_Py}. This confuses the reader, and
55jeopardizes the portability of the user code to future Python
56versions, which may define additional names beginning with one of
57these prefixes.
58
59The header files are typically installed with Python. On \UNIX, these
60are located in the directories
61\file{\envvar{prefix}/include/python\var{version}/} and
62\file{\envvar{exec_prefix}/include/python\var{version}/}, where
63\envvar{prefix} and \envvar{exec_prefix} are defined by the
64corresponding parameters to Python's \program{configure} script and
65\var{version} is \code{sys.version[:3]}. On Windows, the headers are
66installed in \file{\envvar{prefix}/include}, where \envvar{prefix} is
67the installation directory specified to the installer.
68
69To include the headers, place both directories (if different) on your
70compiler's search path for includes. Do \emph{not} place the parent
71directories on the search path and then use
72\samp{\#include <python\shortversion/Python.h>}; this will break on
73multi-platform builds since the platform independent headers under
74\envvar{prefix} include the platform specific headers from
75\envvar{exec_prefix}.
76
77\Cpp{} users should note that though the API is defined entirely using
78C, the header files do properly declare the entry points to be
79\code{extern "C"}, so there is no need to do anything special to use
80the API from \Cpp.
81
82
83\section{Objects, Types and Reference Counts \label{objects}}
84
85Most Python/C API functions have one or more arguments as well as a
86return value of type \ctype{PyObject*}. This type is a pointer
87to an opaque data type representing an arbitrary Python
88object. Since all Python object types are treated the same way by the
89Python language in most situations (e.g., assignments, scope rules,
90and argument passing), it is only fitting that they should be
91represented by a single C type. Almost all Python objects live on the
92heap: you never declare an automatic or static variable of type
93\ctype{PyObject}, only pointer variables of type \ctype{PyObject*} can
94be declared. The sole exception are the type objects\obindex{type};
95since these must never be deallocated, they are typically static
96\ctype{PyTypeObject} objects.
97
98All Python objects (even Python integers) have a \dfn{type} and a
99\dfn{reference count}. An object's type determines what kind of object
100it is (e.g., an integer, a list, or a user-defined function; there are
101many more as explained in the \citetitle[../ref/ref.html]{Python
102Reference Manual}). For each of the well-known types there is a macro
103to check whether an object is of that type; for instance,
104\samp{PyList_Check(\var{a})} is true if (and only if) the object
105pointed to by \var{a} is a Python list.
106
107
108\subsection{Reference Counts \label{refcounts}}
109
110The reference count is important because today's computers have a
111finite (and often severely limited) memory size; it counts how many
112different places there are that have a reference to an object. Such a
113place could be another object, or a global (or static) C variable, or
114a local variable in some C function. When an object's reference count
115becomes zero, the object is deallocated. If it contains references to
116other objects, their reference count is decremented. Those other
117objects may be deallocated in turn, if this decrement makes their
118reference count become zero, and so on. (There's an obvious problem
119with objects that reference each other here; for now, the solution is
120``don't do that.'')
121
122Reference counts are always manipulated explicitly. The normal way is
123to use the macro \cfunction{Py_INCREF()}\ttindex{Py_INCREF()} to
124increment an object's reference count by one, and
125\cfunction{Py_DECREF()}\ttindex{Py_DECREF()} to decrement it by
126one. The \cfunction{Py_DECREF()} macro is considerably more complex
127than the incref one, since it must check whether the reference count
128becomes zero and then cause the object's deallocator to be called.
129The deallocator is a function pointer contained in the object's type
130structure. The type-specific deallocator takes care of decrementing
131the reference counts for other objects contained in the object if this
132is a compound object type, such as a list, as well as performing any
133additional finalization that's needed. There's no chance that the
134reference count can overflow; at least as many bits are used to hold
135the reference count as there are distinct memory locations in virtual
136memory (assuming \code{sizeof(long) >= sizeof(char*)}). Thus, the
137reference count increment is a simple operation.
138
139It is not necessary to increment an object's reference count for every
140local variable that contains a pointer to an object. In theory, the
141object's reference count goes up by one when the variable is made to
142point to it and it goes down by one when the variable goes out of
143scope. However, these two cancel each other out, so at the end the
144reference count hasn't changed. The only real reason to use the
145reference count is to prevent the object from being deallocated as
146long as our variable is pointing to it. If we know that there is at
147least one other reference to the object that lives at least as long as
148our variable, there is no need to increment the reference count
149temporarily. An important situation where this arises is in objects
150that are passed as arguments to C functions in an extension module
151that are called from Python; the call mechanism guarantees to hold a
152reference to every argument for the duration of the call.
153
154However, a common pitfall is to extract an object from a list and
155hold on to it for a while without incrementing its reference count.
156Some other operation might conceivably remove the object from the
157list, decrementing its reference count and possible deallocating it.
158The real danger is that innocent-looking operations may invoke
159arbitrary Python code which could do this; there is a code path which
160allows control to flow back to the user from a \cfunction{Py_DECREF()},
161so almost any operation is potentially dangerous.
162
163A safe approach is to always use the generic operations (functions
164whose name begins with \samp{PyObject_}, \samp{PyNumber_},
165\samp{PySequence_} or \samp{PyMapping_}). These operations always
166increment the reference count of the object they return. This leaves
167the caller with the responsibility to call
168\cfunction{Py_DECREF()} when they are done with the result; this soon
169becomes second nature.
170
171
172\subsubsection{Reference Count Details \label{refcountDetails}}
173
174The reference count behavior of functions in the Python/C API is best
Martin v. Löwis5ce2fec2003-11-06 21:08:11 +0000175explained in terms of \emph{ownership of references}. Ownership
176pertains to references, never to objects (objects are not owned: they
177are always shared). "Owning a reference" means being responsible for
178calling Py_DECREF on it when the reference is no longer needed.
179Ownership can also be transferred, meaning that the code that receives
180ownership of the reference then becomes responsible for eventually
181decref'ing it by calling \cfunction{Py_DECREF()} or
182\cfunction{Py_XDECREF()} when it's no longer needed --or passing on
183this responsibility (usually to its caller).
184When a function passes ownership of a reference on to its caller, the
Fred Drake3adf79e2001-10-12 19:01:43 +0000185caller is said to receive a \emph{new} reference. When no ownership
186is transferred, the caller is said to \emph{borrow} the reference.
187Nothing needs to be done for a borrowed reference.
188
189Conversely, when a calling function passes it a reference to an
190object, there are two possibilities: the function \emph{steals} a
191reference to the object, or it does not. Few functions steal
192references; the two notable exceptions are
193\cfunction{PyList_SetItem()}\ttindex{PyList_SetItem()} and
194\cfunction{PyTuple_SetItem()}\ttindex{PyTuple_SetItem()}, which
195steal a reference to the item (but not to the tuple or list into which
196the item is put!). These functions were designed to steal a reference
197because of a common idiom for populating a tuple or list with newly
198created objects; for example, the code to create the tuple \code{(1,
1992, "three")} could look like this (forgetting about error handling for
200the moment; a better way to code this is shown below):
201
202\begin{verbatim}
203PyObject *t;
204
205t = PyTuple_New(3);
206PyTuple_SetItem(t, 0, PyInt_FromLong(1L));
207PyTuple_SetItem(t, 1, PyInt_FromLong(2L));
208PyTuple_SetItem(t, 2, PyString_FromString("three"));
209\end{verbatim}
210
211Incidentally, \cfunction{PyTuple_SetItem()} is the \emph{only} way to
212set tuple items; \cfunction{PySequence_SetItem()} and
213\cfunction{PyObject_SetItem()} refuse to do this since tuples are an
214immutable data type. You should only use
215\cfunction{PyTuple_SetItem()} for tuples that you are creating
216yourself.
217
218Equivalent code for populating a list can be written using
219\cfunction{PyList_New()} and \cfunction{PyList_SetItem()}. Such code
220can also use \cfunction{PySequence_SetItem()}; this illustrates the
221difference between the two (the extra \cfunction{Py_DECREF()} calls):
222
223\begin{verbatim}
224PyObject *l, *x;
225
226l = PyList_New(3);
227x = PyInt_FromLong(1L);
228PySequence_SetItem(l, 0, x); Py_DECREF(x);
229x = PyInt_FromLong(2L);
230PySequence_SetItem(l, 1, x); Py_DECREF(x);
231x = PyString_FromString("three");
232PySequence_SetItem(l, 2, x); Py_DECREF(x);
233\end{verbatim}
234
235You might find it strange that the ``recommended'' approach takes more
236code. However, in practice, you will rarely use these ways of
237creating and populating a tuple or list. There's a generic function,
238\cfunction{Py_BuildValue()}, that can create most common objects from
239C values, directed by a \dfn{format string}. For example, the
240above two blocks of code could be replaced by the following (which
241also takes care of the error checking):
242
243\begin{verbatim}
244PyObject *t, *l;
245
246t = Py_BuildValue("(iis)", 1, 2, "three");
247l = Py_BuildValue("[iis]", 1, 2, "three");
248\end{verbatim}
249
250It is much more common to use \cfunction{PyObject_SetItem()} and
251friends with items whose references you are only borrowing, like
252arguments that were passed in to the function you are writing. In
253that case, their behaviour regarding reference counts is much saner,
254since you don't have to increment a reference count so you can give a
255reference away (``have it be stolen''). For example, this function
256sets all items of a list (actually, any mutable sequence) to a given
257item:
258
259\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000260int
261set_all(PyObject *target, PyObject *item)
Fred Drake3adf79e2001-10-12 19:01:43 +0000262{
263 int i, n;
264
265 n = PyObject_Length(target);
266 if (n < 0)
267 return -1;
268 for (i = 0; i < n; i++) {
269 if (PyObject_SetItem(target, i, item) < 0)
270 return -1;
271 }
272 return 0;
273}
274\end{verbatim}
275\ttindex{set_all()}
276
277The situation is slightly different for function return values.
278While passing a reference to most functions does not change your
279ownership responsibilities for that reference, many functions that
280return a referece to an object give you ownership of the reference.
281The reason is simple: in many cases, the returned object is created
282on the fly, and the reference you get is the only reference to the
283object. Therefore, the generic functions that return object
284references, like \cfunction{PyObject_GetItem()} and
285\cfunction{PySequence_GetItem()}, always return a new reference (the
286caller becomes the owner of the reference).
287
288It is important to realize that whether you own a reference returned
289by a function depends on which function you call only --- \emph{the
Neal Norwitz7decf5e2003-10-13 17:47:30 +0000290plumage} (the type of the object passed as an
Fred Drake3adf79e2001-10-12 19:01:43 +0000291argument to the function) \emph{doesn't enter into it!} Thus, if you
292extract an item from a list using \cfunction{PyList_GetItem()}, you
293don't own the reference --- but if you obtain the same item from the
294same list using \cfunction{PySequence_GetItem()} (which happens to
295take exactly the same arguments), you do own a reference to the
296returned object.
297
298Here is an example of how you could write a function that computes the
299sum of the items in a list of integers; once using
300\cfunction{PyList_GetItem()}\ttindex{PyList_GetItem()}, and once using
301\cfunction{PySequence_GetItem()}\ttindex{PySequence_GetItem()}.
302
303\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000304long
305sum_list(PyObject *list)
Fred Drake3adf79e2001-10-12 19:01:43 +0000306{
307 int i, n;
308 long total = 0;
309 PyObject *item;
310
311 n = PyList_Size(list);
312 if (n < 0)
313 return -1; /* Not a list */
314 for (i = 0; i < n; i++) {
315 item = PyList_GetItem(list, i); /* Can't fail */
316 if (!PyInt_Check(item)) continue; /* Skip non-integers */
317 total += PyInt_AsLong(item);
318 }
319 return total;
320}
321\end{verbatim}
322\ttindex{sum_list()}
323
324\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000325long
326sum_sequence(PyObject *sequence)
Fred Drake3adf79e2001-10-12 19:01:43 +0000327{
328 int i, n;
329 long total = 0;
330 PyObject *item;
331 n = PySequence_Length(sequence);
332 if (n < 0)
333 return -1; /* Has no length */
334 for (i = 0; i < n; i++) {
335 item = PySequence_GetItem(sequence, i);
336 if (item == NULL)
337 return -1; /* Not a sequence, or other failure */
338 if (PyInt_Check(item))
339 total += PyInt_AsLong(item);
340 Py_DECREF(item); /* Discard reference ownership */
341 }
342 return total;
343}
344\end{verbatim}
345\ttindex{sum_sequence()}
346
347
348\subsection{Types \label{types}}
349
350There are few other data types that play a significant role in
351the Python/C API; most are simple C types such as \ctype{int},
352\ctype{long}, \ctype{double} and \ctype{char*}. A few structure types
353are used to describe static tables used to list the functions exported
354by a module or the data attributes of a new object type, and another
355is used to describe the value of a complex number. These will
356be discussed together with the functions that use them.
357
358
359\section{Exceptions \label{exceptions}}
360
361The Python programmer only needs to deal with exceptions if specific
362error handling is required; unhandled exceptions are automatically
363propagated to the caller, then to the caller's caller, and so on, until
364they reach the top-level interpreter, where they are reported to the
365user accompanied by a stack traceback.
366
367For C programmers, however, error checking always has to be explicit.
368All functions in the Python/C API can raise exceptions, unless an
369explicit claim is made otherwise in a function's documentation. In
370general, when a function encounters an error, it sets an exception,
371discards any object references that it owns, and returns an
372error indicator --- usually \NULL{} or \code{-1}. A few functions
373return a Boolean true/false result, with false indicating an error.
374Very few functions return no explicit error indicator or have an
375ambiguous return value, and require explicit testing for errors with
376\cfunction{PyErr_Occurred()}\ttindex{PyErr_Occurred()}.
377
378Exception state is maintained in per-thread storage (this is
379equivalent to using global storage in an unthreaded application). A
380thread can be in one of two states: an exception has occurred, or not.
381The function \cfunction{PyErr_Occurred()} can be used to check for
382this: it returns a borrowed reference to the exception type object
383when an exception has occurred, and \NULL{} otherwise. There are a
384number of functions to set the exception state:
385\cfunction{PyErr_SetString()}\ttindex{PyErr_SetString()} is the most
386common (though not the most general) function to set the exception
387state, and \cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} clears the
388exception state.
389
390The full exception state consists of three objects (all of which can
391be \NULL): the exception type, the corresponding exception
392value, and the traceback. These have the same meanings as the Python
393\withsubitem{(in module sys)}{
394 \ttindex{exc_type}\ttindex{exc_value}\ttindex{exc_traceback}}
395objects \code{sys.exc_type}, \code{sys.exc_value}, and
396\code{sys.exc_traceback}; however, they are not the same: the Python
397objects represent the last exception being handled by a Python
398\keyword{try} \ldots\ \keyword{except} statement, while the C level
399exception state only exists while an exception is being passed on
400between C functions until it reaches the Python bytecode interpreter's
401main loop, which takes care of transferring it to \code{sys.exc_type}
402and friends.
403
404Note that starting with Python 1.5, the preferred, thread-safe way to
405access the exception state from Python code is to call the function
406\withsubitem{(in module sys)}{\ttindex{exc_info()}}
407\function{sys.exc_info()}, which returns the per-thread exception state
408for Python code. Also, the semantics of both ways to access the
409exception state have changed so that a function which catches an
410exception will save and restore its thread's exception state so as to
411preserve the exception state of its caller. This prevents common bugs
412in exception handling code caused by an innocent-looking function
413overwriting the exception being handled; it also reduces the often
414unwanted lifetime extension for objects that are referenced by the
415stack frames in the traceback.
416
417As a general principle, a function that calls another function to
418perform some task should check whether the called function raised an
419exception, and if so, pass the exception state on to its caller. It
420should discard any object references that it owns, and return an
421error indicator, but it should \emph{not} set another exception ---
422that would overwrite the exception that was just raised, and lose
423important information about the exact cause of the error.
424
425A simple example of detecting exceptions and passing them on is shown
426in the \cfunction{sum_sequence()}\ttindex{sum_sequence()} example
427above. It so happens that that example doesn't need to clean up any
428owned references when it detects an error. The following example
429function shows some error cleanup. First, to remind you why you like
430Python, we show the equivalent Python code:
431
432\begin{verbatim}
433def incr_item(dict, key):
434 try:
435 item = dict[key]
436 except KeyError:
437 item = 0
438 dict[key] = item + 1
439\end{verbatim}
440\ttindex{incr_item()}
441
442Here is the corresponding C code, in all its glory:
443
444\begin{verbatim}
Fred Drake847c51a2001-10-25 15:53:44 +0000445int
446incr_item(PyObject *dict, PyObject *key)
Fred Drake3adf79e2001-10-12 19:01:43 +0000447{
448 /* Objects all initialized to NULL for Py_XDECREF */
449 PyObject *item = NULL, *const_one = NULL, *incremented_item = NULL;
450 int rv = -1; /* Return value initialized to -1 (failure) */
451
452 item = PyObject_GetItem(dict, key);
453 if (item == NULL) {
454 /* Handle KeyError only: */
455 if (!PyErr_ExceptionMatches(PyExc_KeyError))
456 goto error;
457
458 /* Clear the error and use zero: */
459 PyErr_Clear();
460 item = PyInt_FromLong(0L);
461 if (item == NULL)
462 goto error;
463 }
464 const_one = PyInt_FromLong(1L);
465 if (const_one == NULL)
466 goto error;
467
468 incremented_item = PyNumber_Add(item, const_one);
469 if (incremented_item == NULL)
470 goto error;
471
472 if (PyObject_SetItem(dict, key, incremented_item) < 0)
473 goto error;
474 rv = 0; /* Success */
475 /* Continue with cleanup code */
476
477 error:
478 /* Cleanup code, shared by success and failure path */
479
480 /* Use Py_XDECREF() to ignore NULL references */
481 Py_XDECREF(item);
482 Py_XDECREF(const_one);
483 Py_XDECREF(incremented_item);
484
485 return rv; /* -1 for error, 0 for success */
486}
487\end{verbatim}
488\ttindex{incr_item()}
489
490This example represents an endorsed use of the \keyword{goto} statement
491in C! It illustrates the use of
492\cfunction{PyErr_ExceptionMatches()}\ttindex{PyErr_ExceptionMatches()} and
493\cfunction{PyErr_Clear()}\ttindex{PyErr_Clear()} to
494handle specific exceptions, and the use of
495\cfunction{Py_XDECREF()}\ttindex{Py_XDECREF()} to
496dispose of owned references that may be \NULL{} (note the
497\character{X} in the name; \cfunction{Py_DECREF()} would crash when
498confronted with a \NULL{} reference). It is important that the
499variables used to hold owned references are initialized to \NULL{} for
500this to work; likewise, the proposed return value is initialized to
501\code{-1} (failure) and only set to success after the final call made
502is successful.
503
504
505\section{Embedding Python \label{embedding}}
506
507The one important task that only embedders (as opposed to extension
508writers) of the Python interpreter have to worry about is the
509initialization, and possibly the finalization, of the Python
510interpreter. Most functionality of the interpreter can only be used
511after the interpreter has been initialized.
512
513The basic initialization function is
514\cfunction{Py_Initialize()}\ttindex{Py_Initialize()}.
515This initializes the table of loaded modules, and creates the
516fundamental modules \module{__builtin__}\refbimodindex{__builtin__},
517\module{__main__}\refbimodindex{__main__}, \module{sys}\refbimodindex{sys},
518and \module{exceptions}.\refbimodindex{exceptions} It also initializes
519the module search path (\code{sys.path}).%
520\indexiii{module}{search}{path}
521\withsubitem{(in module sys)}{\ttindex{path}}
522
523\cfunction{Py_Initialize()} does not set the ``script argument list''
524(\code{sys.argv}). If this variable is needed by Python code that
525will be executed later, it must be set explicitly with a call to
526\code{PySys_SetArgv(\var{argc},
527\var{argv})}\ttindex{PySys_SetArgv()} subsequent to the call to
528\cfunction{Py_Initialize()}.
529
530On most systems (in particular, on \UNIX{} and Windows, although the
531details are slightly different),
532\cfunction{Py_Initialize()} calculates the module search path based
533upon its best guess for the location of the standard Python
534interpreter executable, assuming that the Python library is found in a
535fixed location relative to the Python interpreter executable. In
536particular, it looks for a directory named
537\file{lib/python\shortversion} relative to the parent directory where
538the executable named \file{python} is found on the shell command
539search path (the environment variable \envvar{PATH}).
540
541For instance, if the Python executable is found in
542\file{/usr/local/bin/python}, it will assume that the libraries are in
543\file{/usr/local/lib/python\shortversion}. (In fact, this particular path
544is also the ``fallback'' location, used when no executable file named
545\file{python} is found along \envvar{PATH}.) The user can override
546this behavior by setting the environment variable \envvar{PYTHONHOME},
547or insert additional directories in front of the standard path by
548setting \envvar{PYTHONPATH}.
549
550The embedding application can steer the search by calling
551\code{Py_SetProgramName(\var{file})}\ttindex{Py_SetProgramName()} \emph{before} calling
552\cfunction{Py_Initialize()}. Note that \envvar{PYTHONHOME} still
553overrides this and \envvar{PYTHONPATH} is still inserted in front of
554the standard path. An application that requires total control has to
555provide its own implementation of
556\cfunction{Py_GetPath()}\ttindex{Py_GetPath()},
557\cfunction{Py_GetPrefix()}\ttindex{Py_GetPrefix()},
558\cfunction{Py_GetExecPrefix()}\ttindex{Py_GetExecPrefix()}, and
559\cfunction{Py_GetProgramFullPath()}\ttindex{Py_GetProgramFullPath()} (all
560defined in \file{Modules/getpath.c}).
561
562Sometimes, it is desirable to ``uninitialize'' Python. For instance,
563the application may want to start over (make another call to
564\cfunction{Py_Initialize()}) or the application is simply done with its
565use of Python and wants to free all memory allocated by Python. This
566can be accomplished by calling \cfunction{Py_Finalize()}. The function
567\cfunction{Py_IsInitialized()}\ttindex{Py_IsInitialized()} returns
568true if Python is currently in the initialized state. More
569information about these functions is given in a later chapter.