| \chapter{Defining New Types |
| \label{defining-new-types}} |
| \sectionauthor{Michael Hudson}{mwh@python.net} |
| \sectionauthor{Dave Kuhlman}{dkuhlman@rexx.com} |
| \sectionauthor{Jim Fulton}{jim@zope.com} |
| |
| As mentioned in the last chapter, Python allows the writer of an |
| extension module to define new types that can be manipulated from |
| Python code, much like strings and lists in core Python. |
| |
| This is not hard; the code for all extension types follows a pattern, |
| but there are some details that you need to understand before you can |
| get started. |
| |
| \begin{notice} |
| The way new types are defined changed dramatically (and for the |
| better) in Python 2.2. This document documents how to define new |
| types for Python 2.2 and later. If you need to support older |
| versions of Python, you will need to refer to |
| \ulink{older versions of this documentation} |
| {http://www.python.org/doc/versions/}. |
| \end{notice} |
| |
| \section{The Basics |
| \label{dnt-basics}} |
| |
| The Python runtime sees all Python objects as variables of type |
| \ctype{PyObject*}. A \ctype{PyObject} is not a very magnificent |
| object - it just contains the refcount and a pointer to the object's |
| ``type object''. This is where the action is; the type object |
| determines which (C) functions get called when, for instance, an |
| attribute gets looked up on an object or it is multiplied by another |
| object. These C functions are called ``type methods'' to distinguish |
| them from things like \code{[].append} (which we call ``object |
| methods''). |
| |
| So, if you want to define a new object type, you need to create a new |
| type object. |
| |
| This sort of thing can only be explained by example, so here's a |
| minimal, but complete, module that defines a new type: |
| |
| \verbatiminput{noddy.c} |
| |
| Now that's quite a bit to take in at once, but hopefully bits will |
| seem familiar from the last chapter. |
| |
| The first bit that will be new is: |
| |
| \begin{verbatim} |
| typedef struct { |
| PyObject_HEAD |
| } noddy_NoddyObject; |
| \end{verbatim} |
| |
| This is what a Noddy object will contain---in this case, nothing more |
| than every Python object contains, namely a refcount and a pointer to a type |
| object. These are the fields the \code{PyObject_HEAD} macro brings |
| in. The reason for the macro is to standardize the layout and to |
| enable special debugging fields in debug builds. Note that there is |
| no semicolon after the \code{PyObject_HEAD} macro; one is included in |
| the macro definition. Be wary of adding one by accident; it's easy to |
| do from habit, and your compiler might not complain, but someone |
| else's probably will! (On Windows, MSVC is known to call this an |
| error and refuse to compile the code.) |
| |
| For contrast, let's take a look at the corresponding definition for |
| standard Python integers: |
| |
| \begin{verbatim} |
| typedef struct { |
| PyObject_HEAD |
| long ob_ival; |
| } PyIntObject; |
| \end{verbatim} |
| |
| Moving on, we come to the crunch --- the type object. |
| |
| \begin{verbatim} |
| static PyTypeObject noddy_NoddyType = { |
| PyObject_HEAD_INIT(NULL) |
| 0, /*ob_size*/ |
| "noddy.Noddy", /*tp_name*/ |
| sizeof(noddy_NoddyObject), /*tp_basicsize*/ |
| 0, /*tp_itemsize*/ |
| 0, /*tp_dealloc*/ |
| 0, /*tp_print*/ |
| 0, /*tp_getattr*/ |
| 0, /*tp_setattr*/ |
| 0, /*tp_compare*/ |
| 0, /*tp_repr*/ |
| 0, /*tp_as_number*/ |
| 0, /*tp_as_sequence*/ |
| 0, /*tp_as_mapping*/ |
| 0, /*tp_hash */ |
| 0, /*tp_call*/ |
| 0, /*tp_str*/ |
| 0, /*tp_getattro*/ |
| 0, /*tp_setattro*/ |
| 0, /*tp_as_buffer*/ |
| Py_TPFLAGS_DEFAULT, /*tp_flags*/ |
| "Noddy objects", /* tp_doc */ |
| }; |
| \end{verbatim} |
| |
| Now if you go and look up the definition of \ctype{PyTypeObject} in |
| \file{object.h} you'll see that it has many more fields that the |
| definition above. The remaining fields will be filled with zeros by |
| the C compiler, and it's common practice to not specify them |
| explicitly unless you need them. |
| |
| This is so important that we're going to pick the top of it apart still |
| further: |
| |
| \begin{verbatim} |
| PyObject_HEAD_INIT(NULL) |
| \end{verbatim} |
| |
| This line is a bit of a wart; what we'd like to write is: |
| |
| \begin{verbatim} |
| PyObject_HEAD_INIT(&PyType_Type) |
| \end{verbatim} |
| |
| as the type of a type object is ``type'', but this isn't strictly |
| conforming C and some compilers complain. Fortunately, this member |
| will be filled in for us by \cfunction{PyType_Ready()}. |
| |
| \begin{verbatim} |
| 0, /* ob_size */ |
| \end{verbatim} |
| |
| The \member{ob_size} field of the header is not used; its presence in |
| the type structure is a historical artifact that is maintained for |
| binary compatibility with extension modules compiled for older |
| versions of Python. Always set this field to zero. |
| |
| \begin{verbatim} |
| "noddy.Noddy", /* tp_name */ |
| \end{verbatim} |
| |
| The name of our type. This will appear in the default textual |
| representation of our objects and in some error messages, for example: |
| |
| \begin{verbatim} |
| >>> "" + noddy.new_noddy() |
| Traceback (most recent call last): |
| File "<stdin>", line 1, in ? |
| TypeError: cannot add type "noddy.Noddy" to string |
| \end{verbatim} |
| |
| Note that the name is a dotted name that includes both the module name |
| and the name of the type within the module. The module in this case is |
| \module{noddy} and the type is \class{Noddy}, so we set the type name |
| to \class{noddy.Noddy}. |
| |
| \begin{verbatim} |
| sizeof(noddy_NoddyObject), /* tp_basicsize */ |
| \end{verbatim} |
| |
| This is so that Python knows how much memory to allocate when you call |
| \cfunction{PyObject_New()}. |
| |
| \note{If you want your type to be subclassable from Python, and your |
| type has the same \member{tp_basicsize} as its base type, you may |
| have problems with multiple inheritance. A Python subclass of your |
| type will have to list your type first in its \member{__bases__}, or |
| else it will not be able to call your type's \method{__new__} method |
| without getting an error. You can avoid this problem by ensuring |
| that your type has a larger value for \member{tp_basicsize} than |
| its base type does. Most of the time, this will be true anyway, |
| because either your base type will be \class{object}, or else you will |
| be adding data members to your base type, and therefore increasing its |
| size.} |
| |
| \begin{verbatim} |
| 0, /* tp_itemsize */ |
| \end{verbatim} |
| |
| This has to do with variable length objects like lists and strings. |
| Ignore this for now. |
| |
| Skipping a number of type methods that we don't provide, we set the |
| class flags to \constant{Py_TPFLAGS_DEFAULT}. |
| |
| \begin{verbatim} |
| Py_TPFLAGS_DEFAULT, /*tp_flags*/ |
| \end{verbatim} |
| |
| All types should include this constant in their flags. It enables all |
| of the members defined by the current version of Python. |
| |
| We provide a doc string for the type in \member{tp_doc}. |
| |
| \begin{verbatim} |
| "Noddy objects", /* tp_doc */ |
| \end{verbatim} |
| |
| Now we get into the type methods, the things that make your objects |
| different from the others. We aren't going to implement any of these |
| in this version of the module. We'll expand this example later to |
| have more interesting behavior. |
| |
| For now, all we want to be able to do is to create new \class{Noddy} |
| objects. To enable object creation, we have to provide a |
| \member{tp_new} implementation. In this case, we can just use the |
| default implementation provided by the API function |
| \cfunction{PyType_GenericNew()}. We'd like to just assign this to the |
| \member{tp_new} slot, but we can't, for portability sake, On some |
| platforms or compilers, we can't statically initialize a structure |
| member with a function defined in another C module, so, instead, we'll |
| assign the \member{tp_new} slot in the module initialization function |
| just before calling \cfunction{PyType_Ready()}: |
| |
| \begin{verbatim} |
| noddy_NoddyType.tp_new = PyType_GenericNew; |
| if (PyType_Ready(&noddy_NoddyType) < 0) |
| return; |
| \end{verbatim} |
| |
| All the other type methods are \NULL, so we'll go over them later |
| --- that's for a later section! |
| |
| Everything else in the file should be familiar, except for some code |
| in \cfunction{initnoddy()}: |
| |
| \begin{verbatim} |
| if (PyType_Ready(&noddy_NoddyType) < 0) |
| return; |
| \end{verbatim} |
| |
| This initializes the \class{Noddy} type, filing in a number of |
| members, including \member{ob_type} that we initially set to \NULL. |
| |
| \begin{verbatim} |
| PyModule_AddObject(m, "Noddy", (PyObject *)&noddy_NoddyType); |
| \end{verbatim} |
| |
| This adds the type to the module dictionary. This allows us to create |
| \class{Noddy} instances by calling the \class{Noddy} class: |
| |
| \begin{verbatim} |
| >>> import noddy |
| >>> mynoddy = noddy.Noddy() |
| \end{verbatim} |
| |
| That's it! All that remains is to build it; put the above code in a |
| file called \file{noddy.c} and |
| |
| \begin{verbatim} |
| from distutils.core import setup, Extension |
| setup(name="noddy", version="1.0", |
| ext_modules=[Extension("noddy", ["noddy.c"])]) |
| \end{verbatim} |
| |
| in a file called \file{setup.py}; then typing |
| |
| \begin{verbatim} |
| $ python setup.py build |
| \end{verbatim} %$ <-- bow to font-lock ;-( |
| |
| at a shell should produce a file \file{noddy.so} in a subdirectory; |
| move to that directory and fire up Python --- you should be able to |
| \code{import noddy} and play around with Noddy objects. |
| |
| That wasn't so hard, was it? |
| |
| Of course, the current Noddy type is pretty uninteresting. It has no |
| data and doesn't do anything. It can't even be subclassed. |
| |
| \subsection{Adding data and methods to the Basic example} |
| |
| Let's expend the basic example to add some data and methods. Let's |
| also make the type usable as a base class. We'll create |
| a new module, \module{noddy2} that adds these capabilities: |
| |
| \verbatiminput{noddy2.c} |
| |
| This version of the module has a number of changes. |
| |
| We've added an extra include: |
| |
| \begin{verbatim} |
| #include "structmember.h" |
| \end{verbatim} |
| |
| This include provides declarations that we use to handle attributes, |
| as described a bit later. |
| |
| The name of the \class{Noddy} object structure has been shortened to |
| \class{Noddy}. The type object name has been shortened to |
| \class{NoddyType}. |
| |
| The \class{Noddy} type now has three data attributes, \var{first}, |
| \var{last}, and \var{number}. The \var{first} and \var{last} |
| variables are Python strings containing first and last names. The |
| \var{number} attribute is an integer. |
| |
| The object structure is updated accordingly: |
| |
| \begin{verbatim} |
| typedef struct { |
| PyObject_HEAD |
| PyObject *first; |
| PyObject *last; |
| int number; |
| } Noddy; |
| \end{verbatim} |
| |
| Because we now have data to manage, we have to be more careful about |
| object allocation and deallocation. At a minimum, we need a |
| deallocation method: |
| |
| \begin{verbatim} |
| static void |
| Noddy_dealloc(Noddy* self) |
| { |
| Py_XDECREF(self->first); |
| Py_XDECREF(self->last); |
| self->ob_type->tp_free((PyObject*)self); |
| } |
| \end{verbatim} |
| |
| which is assigned to the \member{tp_dealloc} member: |
| |
| \begin{verbatim} |
| (destructor)Noddy_dealloc, /*tp_dealloc*/ |
| \end{verbatim} |
| |
| This method decrements the reference counts of the two Python |
| attributes. We use \cfunction{Py_XDECREF()} here because the |
| \member{first} and \member{last} members could be \NULL. It then |
| calls the \member{tp_free} member of the object's type to free the |
| object's memory. Note that the object's type might not be |
| \class{NoddyType}, because the object may be an instance of a |
| subclass. |
| |
| We want to make sure that the first and last names are initialized to |
| empty strings, so we provide a new method: |
| |
| \begin{verbatim} |
| static PyObject * |
| Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds) |
| { |
| Noddy *self; |
| |
| self = (Noddy *)type->tp_alloc(type, 0); |
| if (self != NULL) { |
| self->first = PyString_FromString(""); |
| if (self->first == NULL) |
| { |
| Py_DECREF(self); |
| return NULL; |
| } |
| |
| self->last = PyString_FromString(""); |
| if (self->last == NULL) |
| { |
| Py_DECREF(self); |
| return NULL; |
| } |
| |
| self->number = 0; |
| } |
| |
| return (PyObject *)self; |
| } |
| \end{verbatim} |
| |
| and install it in the \member{tp_new} member: |
| |
| \begin{verbatim} |
| Noddy_new, /* tp_new */ |
| \end{verbatim} |
| |
| The new member is responsible for creating (as opposed to |
| initializing) objects of the type. It is exposed in Python as the |
| \method{__new__()} method. See the paper titled ``Unifying types and |
| classes in Python'' for a detailed discussion of the \method{__new__()} |
| method. One reason to implement a new method is to assure the initial |
| values of instance variables. In this case, we use the new method to |
| make sure that the initial values of the members \member{first} and |
| \member{last} are not \NULL. If we didn't care whether the initial |
| values were \NULL, we could have used \cfunction{PyType_GenericNew()} as |
| our new method, as we did before. \cfunction{PyType_GenericNew()} |
| initializes all of the instance variable members to \NULL. |
| |
| The new method is a static method that is passed the type being |
| instantiated and any arguments passed when the type was called, |
| and that returns the new object created. New methods always accept |
| positional and keyword arguments, but they often ignore the arguments, |
| leaving the argument handling to initializer methods. Note that if the |
| type supports subclassing, the type passed may not be the type being |
| defined. The new method calls the tp_alloc slot to allocate memory. |
| We don't fill the \member{tp_alloc} slot ourselves. Rather |
| \cfunction{PyType_Ready()} fills it for us by inheriting it from our |
| base class, which is \class{object} by default. Most types use the |
| default allocation. |
| |
| \note{If you are creating a co-operative \member{tp_new} (one that |
| calls a base type's \member{tp_new} or \method{__new__}), you |
| must \emph{not} try to determine what method to call using |
| method resolution order at runtime. Always statically determine |
| what type you are going to call, and call its \member{tp_new} |
| directly, or via \code{type->tp_base->tp_new}. If you do |
| not do this, Python subclasses of your type that also inherit |
| from other Python-defined classes may not work correctly. |
| (Specifically, you may not be able to create instances of |
| such subclasses without getting a \exception{TypeError}.)} |
| |
| We provide an initialization function: |
| |
| \begin{verbatim} |
| static int |
| Noddy_init(Noddy *self, PyObject *args, PyObject *kwds) |
| { |
| PyObject *first=NULL, *last=NULL, *tmp; |
| |
| static char *kwlist[] = {"first", "last", "number", NULL}; |
| |
| if (! PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist, |
| &first, &last, |
| &self->number)) |
| return -1; |
| |
| if (first) { |
| tmp = self->first; |
| Py_INCREF(first); |
| self->first = first; |
| Py_XDECREF(tmp); |
| } |
| |
| if (last) { |
| tmp = self->last; |
| Py_INCREF(last); |
| self->last = last; |
| Py_XDECREF(tmp); |
| } |
| |
| return 0; |
| } |
| \end{verbatim} |
| |
| by filling the \member{tp_init} slot. |
| |
| \begin{verbatim} |
| (initproc)Noddy_init, /* tp_init */ |
| \end{verbatim} |
| |
| The \member{tp_init} slot is exposed in Python as the |
| \method{__init__()} method. It is used to initialize an object after |
| it's created. Unlike the new method, we can't guarantee that the |
| initializer is called. The initializer isn't called when unpickling |
| objects and it can be overridden. Our initializer accepts arguments |
| to provide initial values for our instance. Initializers always accept |
| positional and keyword arguments. |
| |
| Initializers can be called multiple times. Anyone can call the |
| \method{__init__()} method on our objects. For this reason, we have |
| to be extra careful when assigning the new values. We might be |
| tempted, for example to assign the \member{first} member like this: |
| |
| \begin{verbatim} |
| if (first) { |
| Py_XDECREF(self->first); |
| Py_INCREF(first); |
| self->first = first; |
| } |
| \end{verbatim} |
| |
| But this would be risky. Our type doesn't restrict the type of the |
| \member{first} member, so it could be any kind of object. It could |
| have a destructor that causes code to be executed that tries to |
| access the \member{first} member. To be paranoid and protect |
| ourselves against this possibility, we almost always reassign members |
| before decrementing their reference counts. When don't we have to do |
| this? |
| \begin{itemize} |
| \item when we absolutely know that the reference count is greater than |
| 1 |
| \item when we know that deallocation of the object\footnote{This is |
| true when we know that the object is a basic type, like a string or |
| a float.} will not cause any |
| calls back into our type's code |
| \item when decrementing a reference count in a \member{tp_dealloc} |
| handler when garbage-collections is not supported\footnote{We relied |
| on this in the \member{tp_dealloc} handler in this example, because |
| our type doesn't support garbage collection. Even if a type supports |
| garbage collection, there are calls that can be made to ``untrack'' |
| the object from garbage collection, however, these calls are |
| advanced and not covered here.} |
| \item |
| \end{itemize} |
| |
| |
| We want to want to expose our instance variables as attributes. There |
| are a number of ways to do that. The simplest way is to define member |
| definitions: |
| |
| \begin{verbatim} |
| static PyMemberDef Noddy_members[] = { |
| {"first", T_OBJECT_EX, offsetof(Noddy, first), 0, |
| "first name"}, |
| {"last", T_OBJECT_EX, offsetof(Noddy, last), 0, |
| "last name"}, |
| {"number", T_INT, offsetof(Noddy, number), 0, |
| "noddy number"}, |
| {NULL} /* Sentinel */ |
| }; |
| \end{verbatim} |
| |
| and put the definitions in the \member{tp_members} slot: |
| |
| \begin{verbatim} |
| Noddy_members, /* tp_members */ |
| \end{verbatim} |
| |
| Each member definition has a member name, type, offset, access flags |
| and documentation string. See the ``Generic Attribute Management'' |
| section below for details. |
| |
| A disadvantage of this approach is that it doesn't provide a way to |
| restrict the types of objects that can be assigned to the Python |
| attributes. We expect the first and last names to be strings, but any |
| Python objects can be assigned. Further, the attributes can be |
| deleted, setting the C pointers to \NULL. Even though we can make |
| sure the members are initialized to non-\NULL{} values, the members can |
| be set to \NULL{} if the attributes are deleted. |
| |
| We define a single method, \method{name}, that outputs the objects |
| name as the concatenation of the first and last names. |
| |
| \begin{verbatim} |
| static PyObject * |
| Noddy_name(Noddy* self) |
| { |
| static PyObject *format = NULL; |
| PyObject *args, *result; |
| |
| if (format == NULL) { |
| format = PyString_FromString("%s %s"); |
| if (format == NULL) |
| return NULL; |
| } |
| |
| if (self->first == NULL) { |
| PyErr_SetString(PyExc_AttributeError, "first"); |
| return NULL; |
| } |
| |
| if (self->last == NULL) { |
| PyErr_SetString(PyExc_AttributeError, "last"); |
| return NULL; |
| } |
| |
| args = Py_BuildValue("OO", self->first, self->last); |
| if (args == NULL) |
| return NULL; |
| |
| result = PyString_Format(format, args); |
| Py_DECREF(args); |
| |
| return result; |
| } |
| \end{verbatim} |
| |
| The method is implemented as a C function that takes a \class{Noddy} (or |
| \class{Noddy} subclass) instance as the first argument. Methods |
| always take an instance as the first argument. Methods often take |
| positional and keyword arguments as well, but in this cased we don't |
| take any and don't need to accept a positional argument tuple or |
| keyword argument dictionary. This method is equivalent to the Python |
| method: |
| |
| \begin{verbatim} |
| def name(self): |
| return "%s %s" % (self.first, self.last) |
| \end{verbatim} |
| |
| Note that we have to check for the possibility that our \member{first} |
| and \member{last} members are \NULL. This is because they can be |
| deleted, in which case they are set to \NULL. It would be better to |
| prevent deletion of these attributes and to restrict the attribute |
| values to be strings. We'll see how to do that in the next section. |
| |
| Now that we've defined the method, we need to create an array of |
| method definitions: |
| |
| \begin{verbatim} |
| static PyMethodDef Noddy_methods[] = { |
| {"name", (PyCFunction)Noddy_name, METH_NOARGS, |
| "Return the name, combining the first and last name" |
| }, |
| {NULL} /* Sentinel */ |
| }; |
| \end{verbatim} |
| |
| and assign them to the \member{tp_methods} slot: |
| |
| \begin{verbatim} |
| Noddy_methods, /* tp_methods */ |
| \end{verbatim} |
| |
| Note that we used the \constant{METH_NOARGS} flag to indicate that the |
| method is passed no arguments. |
| |
| Finally, we'll make our type usable as a base class. We've written |
| our methods carefully so far so that they don't make any assumptions |
| about the type of the object being created or used, so all we need to |
| do is to add the \constant{Py_TPFLAGS_BASETYPE} to our class flag |
| definition: |
| |
| \begin{verbatim} |
| Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /*tp_flags*/ |
| \end{verbatim} |
| |
| We rename \cfunction{initnoddy()} to \cfunction{initnoddy2()} |
| and update the module name passed to \cfunction{Py_InitModule3()}. |
| |
| Finally, we update our \file{setup.py} file to build the new module: |
| |
| \begin{verbatim} |
| from distutils.core import setup, Extension |
| setup(name="noddy", version="1.0", |
| ext_modules=[ |
| Extension("noddy", ["noddy.c"]), |
| Extension("noddy2", ["noddy2.c"]), |
| ]) |
| \end{verbatim} |
| |
| \subsection{Providing finer control over data attributes} |
| |
| In this section, we'll provide finer control over how the |
| \member{first} and \member{last} attributes are set in the |
| \class{Noddy} example. In the previous version of our module, the |
| instance variables \member{first} and \member{last} could be set to |
| non-string values or even deleted. We want to make sure that these |
| attributes always contain strings. |
| |
| \verbatiminput{noddy3.c} |
| |
| To provide greater control, over the \member{first} and \member{last} |
| attributes, we'll use custom getter and setter functions. Here are |
| the functions for getting and setting the \member{first} attribute: |
| |
| \begin{verbatim} |
| Noddy_getfirst(Noddy *self, void *closure) |
| { |
| Py_INCREF(self->first); |
| return self->first; |
| } |
| |
| static int |
| Noddy_setfirst(Noddy *self, PyObject *value, void *closure) |
| { |
| if (value == NULL) { |
| PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute"); |
| return -1; |
| } |
| |
| if (! PyString_Check(value)) { |
| PyErr_SetString(PyExc_TypeError, |
| "The first attribute value must be a string"); |
| return -1; |
| } |
| |
| Py_DECREF(self->first); |
| Py_INCREF(value); |
| self->first = value; |
| |
| return 0; |
| } |
| \end{verbatim} |
| |
| The getter function is passed a \class{Noddy} object and a |
| ``closure'', which is void pointer. In this case, the closure is |
| ignored. (The closure supports an advanced usage in which definition |
| data is passed to the getter and setter. This could, for example, be |
| used to allow a single set of getter and setter functions that decide |
| the attribute to get or set based on data in the closure.) |
| |
| The setter function is passed the \class{Noddy} object, the new value, |
| and the closure. The new value may be \NULL, in which case the |
| attribute is being deleted. In our setter, we raise an error if the |
| attribute is deleted or if the attribute value is not a string. |
| |
| We create an array of \ctype{PyGetSetDef} structures: |
| |
| \begin{verbatim} |
| static PyGetSetDef Noddy_getseters[] = { |
| {"first", |
| (getter)Noddy_getfirst, (setter)Noddy_setfirst, |
| "first name", |
| NULL}, |
| {"last", |
| (getter)Noddy_getlast, (setter)Noddy_setlast, |
| "last name", |
| NULL}, |
| {NULL} /* Sentinel */ |
| }; |
| \end{verbatim} |
| |
| and register it in the \member{tp_getset} slot: |
| |
| \begin{verbatim} |
| Noddy_getseters, /* tp_getset */ |
| \end{verbatim} |
| |
| to register out attribute getters and setters. |
| |
| The last item in a \ctype{PyGetSetDef} structure is the closure |
| mentioned above. In this case, we aren't using the closure, so we just |
| pass \NULL. |
| |
| We also remove the member definitions for these attributes: |
| |
| \begin{verbatim} |
| static PyMemberDef Noddy_members[] = { |
| {"number", T_INT, offsetof(Noddy, number), 0, |
| "noddy number"}, |
| {NULL} /* Sentinel */ |
| }; |
| \end{verbatim} |
| |
| We also need to update the \member{tp_init} handler to only allow |
| strings\footnote{We now know that the first and last members are strings, |
| so perhaps we could be less careful about decrementing their |
| reference counts, however, we accept instances of string subclasses. |
| Even though deallocating normal strings won't call back into our |
| objects, we can't guarantee that deallocating an instance of a string |
| subclass won't. call back into out objects.} to be passed: |
| |
| \begin{verbatim} |
| static int |
| Noddy_init(Noddy *self, PyObject *args, PyObject *kwds) |
| { |
| PyObject *first=NULL, *last=NULL, *tmp; |
| |
| static char *kwlist[] = {"first", "last", "number", NULL}; |
| |
| if (! PyArg_ParseTupleAndKeywords(args, kwds, "|SSi", kwlist, |
| &first, &last, |
| &self->number)) |
| return -1; |
| |
| if (first) { |
| tmp = self->first; |
| Py_INCREF(first); |
| self->first = first; |
| Py_DECREF(tmp); |
| } |
| |
| if (last) { |
| tmp = self->last; |
| Py_INCREF(last); |
| self->last = last; |
| Py_DECREF(tmp); |
| } |
| |
| return 0; |
| } |
| \end{verbatim} |
| |
| With these changes, we can assure that the \member{first} and |
| \member{last} members are never \NULL{} so we can remove checks for \NULL{} |
| values in almost all cases. This means that most of the |
| \cfunction{Py_XDECREF()} calls can be converted to \cfunction{Py_DECREF()} |
| calls. The only place we can't change these calls is in the |
| deallocator, where there is the possibility that the initialization of |
| these members failed in the constructor. |
| |
| We also rename the module initialization function and module name in |
| the initialization function, as we did before, and we add an extra |
| definition to the \file{setup.py} file. |
| |
| \subsection{Supporting cyclic garbage collection} |
| |
| Python has a cyclic-garbage collector that can identify unneeded |
| objects even when their reference counts are not zero. This can happen |
| when objects are involved in cycles. For example, consider: |
| |
| \begin{verbatim} |
| >>> l = [] |
| >>> l.append(l) |
| >>> del l |
| \end{verbatim} |
| |
| In this example, we create a list that contains itself. When we delete |
| it, it still has a reference from itself. Its reference count doesn't |
| drop to zero. Fortunately, Python's cyclic-garbage collector will |
| eventually figure out that the list is garbage and free it. |
| |
| In the second version of the \class{Noddy} example, we allowed any |
| kind of object to be stored in the \member{first} or \member{last} |
| attributes.\footnote{Even in the third version, we aren't guaranteed to |
| avoid cycles. Instances of string subclasses are allowed and string |
| subclasses could allow cycles even if normal strings don't.} This |
| means that \class{Noddy} objects can participate in cycles: |
| |
| \begin{verbatim} |
| >>> import noddy2 |
| >>> n = noddy2.Noddy() |
| >>> l = [n] |
| >>> n.first = l |
| \end{verbatim} |
| |
| This is pretty silly, but it gives us an excuse to add support for the |
| cyclic-garbage collector to the \class{Noddy} example. To support |
| cyclic garbage collection, types need to fill two slots and set a |
| class flag that enables these slots: |
| |
| \verbatiminput{noddy4.c} |
| |
| The traversal method provides access to subobjects that |
| could participate in cycles: |
| |
| \begin{verbatim} |
| static int |
| Noddy_traverse(Noddy *self, visitproc visit, void *arg) |
| { |
| int vret; |
| |
| if (self->first) { |
| vret = visit(self->first, arg); |
| if (vret != 0) |
| return vret; |
| } |
| if (self->last) { |
| vret = visit(self->last, arg); |
| if (vret != 0) |
| return vret; |
| } |
| |
| return 0; |
| } |
| \end{verbatim} |
| |
| For each subobject that can participate in cycles, we need to call the |
| \cfunction{visit()} function, which is passed to the traversal method. |
| The \cfunction{visit()} function takes as arguments the subobject and |
| the extra argument \var{arg} passed to the traversal method. It |
| returns an integer value that must be returned if it is non-zero. |
| |
| |
| Python 2.4 and higher provide a \cfunction{Py_VISIT()} macro that automates |
| calling visit functions. With \cfunction{Py_VISIT()}, |
| \cfunction{Noddy_traverse()} can be simplified: |
| |
| |
| \begin{verbatim} |
| static int |
| Noddy_traverse(Noddy *self, visitproc visit, void *arg) |
| { |
| Py_VISIT(self->first); |
| Py_VISIT(self->last); |
| return 0; |
| } |
| \end{verbatim} |
| |
| \note{Note that the \member{tp_traverse} implementation must name its |
| arguments exactly \var{visit} and \var{arg} in order to use |
| \cfunction{Py_VISIT()}. This is to encourage uniformity |
| across these boring implementations.} |
| |
| We also need to provide a method for clearing any subobjects that can |
| participate in cycles. We implement the method and reimplement the |
| deallocator to use it: |
| |
| \begin{verbatim} |
| static int |
| Noddy_clear(Noddy *self) |
| { |
| PyObject *tmp; |
| |
| tmp = self->first; |
| self->first = NULL; |
| Py_XDECREF(tmp); |
| |
| tmp = self->last; |
| self->last = NULL; |
| Py_XDECREF(tmp); |
| |
| return 0; |
| } |
| |
| static void |
| Noddy_dealloc(Noddy* self) |
| { |
| Noddy_clear(self); |
| self->ob_type->tp_free((PyObject*)self); |
| } |
| \end{verbatim} |
| |
| Notice the use of a temporary variable in \cfunction{Noddy_clear()}. |
| We use the temporary variable so that we can set each member to \NULL{} |
| before decrementing its reference count. We do this because, as was |
| discussed earlier, if the reference count drops to zero, we might |
| cause code to run that calls back into the object. In addition, |
| because we now support garbage collection, we also have to worry about |
| code being run that triggers garbage collection. If garbage |
| collection is run, our \member{tp_traverse} handler could get called. |
| We can't take a chance of having \cfunction{Noddy_traverse()} called |
| when a member's reference count has dropped to zero and its value |
| hasn't been set to \NULL. |
| |
| Python 2.4 and higher provide a \cfunction{Py_CLEAR()} that automates |
| the careful decrementing of reference counts. With |
| \cfunction{Py_CLEAR()}, the \cfunction{Noddy_clear()} function can be |
| simplified: |
| |
| \begin{verbatim} |
| static int |
| Noddy_clear(Noddy *self) |
| { |
| Py_CLEAR(self->first); |
| Py_CLEAR(self->last); |
| return 0; |
| } |
| \end{verbatim} |
| |
| Finally, we add the \constant{Py_TPFLAGS_HAVE_GC} flag to the class |
| flags: |
| |
| \begin{verbatim} |
| Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, /*tp_flags*/ |
| \end{verbatim} |
| |
| That's pretty much it. If we had written custom \member{tp_alloc} or |
| \member{tp_free} slots, we'd need to modify them for cyclic-garbage |
| collection. Most extensions will use the versions automatically |
| provided. |
| |
| \section{Type Methods |
| \label{dnt-type-methods}} |
| |
| This section aims to give a quick fly-by on the various type methods |
| you can implement and what they do. |
| |
| Here is the definition of \ctype{PyTypeObject}, with some fields only |
| used in debug builds omitted: |
| |
| \verbatiminput{typestruct.h} |
| |
| Now that's a \emph{lot} of methods. Don't worry too much though - if |
| you have a type you want to define, the chances are very good that you |
| will only implement a handful of these. |
| |
| As you probably expect by now, we're going to go over this and give |
| more information about the various handlers. We won't go in the order |
| they are defined in the structure, because there is a lot of |
| historical baggage that impacts the ordering of the fields; be sure |
| your type initialization keeps the fields in the right order! It's |
| often easiest to find an example that includes all the fields you need |
| (even if they're initialized to \code{0}) and then change the values |
| to suit your new type. |
| |
| \begin{verbatim} |
| char *tp_name; /* For printing */ |
| \end{verbatim} |
| |
| The name of the type - as mentioned in the last section, this will |
| appear in various places, almost entirely for diagnostic purposes. |
| Try to choose something that will be helpful in such a situation! |
| |
| \begin{verbatim} |
| int tp_basicsize, tp_itemsize; /* For allocation */ |
| \end{verbatim} |
| |
| These fields tell the runtime how much memory to allocate when new |
| objects of this type are created. Python has some built-in support |
| for variable length structures (think: strings, lists) which is where |
| the \member{tp_itemsize} field comes in. This will be dealt with |
| later. |
| |
| \begin{verbatim} |
| char *tp_doc; |
| \end{verbatim} |
| |
| Here you can put a string (or its address) that you want returned when |
| the Python script references \code{obj.__doc__} to retrieve the |
| doc string. |
| |
| Now we come to the basic type methods---the ones most extension types |
| will implement. |
| |
| |
| \subsection{Finalization and De-allocation} |
| |
| \index{object!deallocation} |
| \index{deallocation, object} |
| \index{object!finalization} |
| \index{finalization, of objects} |
| |
| \begin{verbatim} |
| destructor tp_dealloc; |
| \end{verbatim} |
| |
| This function is called when the reference count of the instance of |
| your type is reduced to zero and the Python interpreter wants to |
| reclaim it. If your type has memory to free or other clean-up to |
| perform, put it here. The object itself needs to be freed here as |
| well. Here is an example of this function: |
| |
| \begin{verbatim} |
| static void |
| newdatatype_dealloc(newdatatypeobject * obj) |
| { |
| free(obj->obj_UnderlyingDatatypePtr); |
| obj->ob_type->tp_free(obj); |
| } |
| \end{verbatim} |
| |
| One important requirement of the deallocator function is that it |
| leaves any pending exceptions alone. This is important since |
| deallocators are frequently called as the interpreter unwinds the |
| Python stack; when the stack is unwound due to an exception (rather |
| than normal returns), nothing is done to protect the deallocators from |
| seeing that an exception has already been set. Any actions which a |
| deallocator performs which may cause additional Python code to be |
| executed may detect that an exception has been set. This can lead to |
| misleading errors from the interpreter. The proper way to protect |
| against this is to save a pending exception before performing the |
| unsafe action, and restoring it when done. This can be done using the |
| \cfunction{PyErr_Fetch()}\ttindex{PyErr_Fetch()} and |
| \cfunction{PyErr_Restore()}\ttindex{PyErr_Restore()} functions: |
| |
| \begin{verbatim} |
| static void |
| my_dealloc(PyObject *obj) |
| { |
| MyObject *self = (MyObject *) obj; |
| PyObject *cbresult; |
| |
| if (self->my_callback != NULL) { |
| PyObject *err_type, *err_value, *err_traceback; |
| int have_error = PyErr_Occurred() ? 1 : 0; |
| |
| if (have_error) |
| PyErr_Fetch(&err_type, &err_value, &err_traceback); |
| |
| cbresult = PyObject_CallObject(self->my_callback, NULL); |
| if (cbresult == NULL) |
| PyErr_WriteUnraisable(self->my_callback); |
| else |
| Py_DECREF(cbresult); |
| |
| if (have_error) |
| PyErr_Restore(err_type, err_value, err_traceback); |
| |
| Py_DECREF(self->my_callback); |
| } |
| obj->ob_type->tp_free((PyObject*)self); |
| } |
| \end{verbatim} |
| |
| |
| \subsection{Object Presentation} |
| |
| In Python, there are three ways to generate a textual representation |
| of an object: the \function{repr()}\bifuncindex{repr} function (or |
| equivalent back-tick syntax), the \function{str()}\bifuncindex{str} |
| function, and the \keyword{print} statement. For most objects, the |
| \keyword{print} statement is equivalent to the \function{str()} |
| function, but it is possible to special-case printing to a |
| \ctype{FILE*} if necessary; this should only be done if efficiency is |
| identified as a problem and profiling suggests that creating a |
| temporary string object to be written to a file is too expensive. |
| |
| These handlers are all optional, and most types at most need to |
| implement the \member{tp_str} and \member{tp_repr} handlers. |
| |
| \begin{verbatim} |
| reprfunc tp_repr; |
| reprfunc tp_str; |
| printfunc tp_print; |
| \end{verbatim} |
| |
| The \member{tp_repr} handler should return a string object containing |
| a representation of the instance for which it is called. Here is a |
| simple example: |
| |
| \begin{verbatim} |
| static PyObject * |
| newdatatype_repr(newdatatypeobject * obj) |
| { |
| return PyString_FromFormat("Repr-ified_newdatatype{{size:\%d}}", |
| obj->obj_UnderlyingDatatypePtr->size); |
| } |
| \end{verbatim} |
| |
| If no \member{tp_repr} handler is specified, the interpreter will |
| supply a representation that uses the type's \member{tp_name} and a |
| uniquely-identifying value for the object. |
| |
| The \member{tp_str} handler is to \function{str()} what the |
| \member{tp_repr} handler described above is to \function{repr()}; that |
| is, it is called when Python code calls \function{str()} on an |
| instance of your object. Its implementation is very similar to the |
| \member{tp_repr} function, but the resulting string is intended for |
| human consumption. If \member{tp_str} is not specified, the |
| \member{tp_repr} handler is used instead. |
| |
| Here is a simple example: |
| |
| \begin{verbatim} |
| static PyObject * |
| newdatatype_str(newdatatypeobject * obj) |
| { |
| return PyString_FromFormat("Stringified_newdatatype{{size:\%d}}", |
| obj->obj_UnderlyingDatatypePtr->size); |
| } |
| \end{verbatim} |
| |
| The print function will be called whenever Python needs to "print" an |
| instance of the type. For example, if 'node' is an instance of type |
| TreeNode, then the print function is called when Python code calls: |
| |
| \begin{verbatim} |
| print node |
| \end{verbatim} |
| |
| There is a flags argument and one flag, \constant{Py_PRINT_RAW}, and |
| it suggests that you print without string quotes and possibly without |
| interpreting escape sequences. |
| |
| The print function receives a file object as an argument. You will |
| likely want to write to that file object. |
| |
| Here is a sample print function: |
| |
| \begin{verbatim} |
| static int |
| newdatatype_print(newdatatypeobject *obj, FILE *fp, int flags) |
| { |
| if (flags & Py_PRINT_RAW) { |
| fprintf(fp, "<{newdatatype object--size: %d}>", |
| obj->obj_UnderlyingDatatypePtr->size); |
| } |
| else { |
| fprintf(fp, "\"<{newdatatype object--size: %d}>\"", |
| obj->obj_UnderlyingDatatypePtr->size); |
| } |
| return 0; |
| } |
| \end{verbatim} |
| |
| |
| \subsection{Attribute Management} |
| |
| For every object which can support attributes, the corresponding type |
| must provide the functions that control how the attributes are |
| resolved. There needs to be a function which can retrieve attributes |
| (if any are defined), and another to set attributes (if setting |
| attributes is allowed). Removing an attribute is a special case, for |
| which the new value passed to the handler is \NULL. |
| |
| Python supports two pairs of attribute handlers; a type that supports |
| attributes only needs to implement the functions for one pair. The |
| difference is that one pair takes the name of the attribute as a |
| \ctype{char*}, while the other accepts a \ctype{PyObject*}. Each type |
| can use whichever pair makes more sense for the implementation's |
| convenience. |
| |
| \begin{verbatim} |
| getattrfunc tp_getattr; /* char * version */ |
| setattrfunc tp_setattr; |
| /* ... */ |
| getattrofunc tp_getattrofunc; /* PyObject * version */ |
| setattrofunc tp_setattrofunc; |
| \end{verbatim} |
| |
| If accessing attributes of an object is always a simple operation |
| (this will be explained shortly), there are generic implementations |
| which can be used to provide the \ctype{PyObject*} version of the |
| attribute management functions. The actual need for type-specific |
| attribute handlers almost completely disappeared starting with Python |
| 2.2, though there are many examples which have not been updated to use |
| some of the new generic mechanism that is available. |
| |
| |
| \subsubsection{Generic Attribute Management} |
| |
| \versionadded{2.2} |
| |
| Most extension types only use \emph{simple} attributes. So, what |
| makes the attributes simple? There are only a couple of conditions |
| that must be met: |
| |
| \begin{enumerate} |
| \item The name of the attributes must be known when |
| \cfunction{PyType_Ready()} is called. |
| |
| \item No special processing is needed to record that an attribute |
| was looked up or set, nor do actions need to be taken based |
| on the value. |
| \end{enumerate} |
| |
| Note that this list does not place any restrictions on the values of |
| the attributes, when the values are computed, or how relevant data is |
| stored. |
| |
| When \cfunction{PyType_Ready()} is called, it uses three tables |
| referenced by the type object to create \emph{descriptors} which are |
| placed in the dictionary of the type object. Each descriptor controls |
| access to one attribute of the instance object. Each of the tables is |
| optional; if all three are \NULL, instances of the type will only have |
| attributes that are inherited from their base type, and should leave |
| the \member{tp_getattro} and \member{tp_setattro} fields \NULL{} as |
| well, allowing the base type to handle attributes. |
| |
| The tables are declared as three fields of the type object: |
| |
| \begin{verbatim} |
| struct PyMethodDef *tp_methods; |
| struct PyMemberDef *tp_members; |
| struct PyGetSetDef *tp_getset; |
| \end{verbatim} |
| |
| If \member{tp_methods} is not \NULL, it must refer to an array of |
| \ctype{PyMethodDef} structures. Each entry in the table is an |
| instance of this structure: |
| |
| \begin{verbatim} |
| typedef struct PyMethodDef { |
| char *ml_name; /* method name */ |
| PyCFunction ml_meth; /* implementation function */ |
| int ml_flags; /* flags */ |
| char *ml_doc; /* docstring */ |
| } PyMethodDef; |
| \end{verbatim} |
| |
| One entry should be defined for each method provided by the type; no |
| entries are needed for methods inherited from a base type. One |
| additional entry is needed at the end; it is a sentinel that marks the |
| end of the array. The \member{ml_name} field of the sentinel must be |
| \NULL. |
| |
| XXX Need to refer to some unified discussion of the structure fields, |
| shared with the next section. |
| |
| The second table is used to define attributes which map directly to |
| data stored in the instance. A variety of primitive C types are |
| supported, and access may be read-only or read-write. The structures |
| in the table are defined as: |
| |
| \begin{verbatim} |
| typedef struct PyMemberDef { |
| char *name; |
| int type; |
| int offset; |
| int flags; |
| char *doc; |
| } PyMemberDef; |
| \end{verbatim} |
| |
| For each entry in the table, a descriptor will be constructed and |
| added to the type which will be able to extract a value from the |
| instance structure. The \member{type} field should contain one of the |
| type codes defined in the \file{structmember.h} header; the value will |
| be used to determine how to convert Python values to and from C |
| values. The \member{flags} field is used to store flags which control |
| how the attribute can be accessed. |
| |
| XXX Need to move some of this to a shared section! |
| |
| The following flag constants are defined in \file{structmember.h}; |
| they may be combined using bitwise-OR. |
| |
| \begin{tableii}{l|l}{constant}{Constant}{Meaning} |
| \lineii{READONLY \ttindex{READONLY}} |
| {Never writable.} |
| \lineii{RO \ttindex{RO}} |
| {Shorthand for \constant{READONLY}.} |
| \lineii{READ_RESTRICTED \ttindex{READ_RESTRICTED}} |
| {Not readable in restricted mode.} |
| \lineii{WRITE_RESTRICTED \ttindex{WRITE_RESTRICTED}} |
| {Not writable in restricted mode.} |
| \lineii{RESTRICTED \ttindex{RESTRICTED}} |
| {Not readable or writable in restricted mode.} |
| \end{tableii} |
| |
| An interesting advantage of using the \member{tp_members} table to |
| build descriptors that are used at runtime is that any attribute |
| defined this way can have an associated doc string simply by providing |
| the text in the table. An application can use the introspection API |
| to retrieve the descriptor from the class object, and get the |
| doc string using its \member{__doc__} attribute. |
| |
| As with the \member{tp_methods} table, a sentinel entry with a |
| \member{name} value of \NULL{} is required. |
| |
| |
| % XXX Descriptors need to be explained in more detail somewhere, but |
| % not here. |
| % |
| % Descriptor objects have two handler functions which correspond to |
| % the \member{tp_getattro} and \member{tp_setattro} handlers. The |
| % \method{__get__()} handler is a function which is passed the |
| % descriptor, instance, and type objects, and returns the value of the |
| % attribute, or it returns \NULL{} and sets an exception. The |
| % \method{__set__()} handler is passed the descriptor, instance, type, |
| % and new value; |
| |
| |
| \subsubsection{Type-specific Attribute Management} |
| |
| For simplicity, only the \ctype{char*} version will be demonstrated |
| here; the type of the name parameter is the only difference between |
| the \ctype{char*} and \ctype{PyObject*} flavors of the interface. |
| This example effectively does the same thing as the generic example |
| above, but does not use the generic support added in Python 2.2. The |
| value in showing this is two-fold: it demonstrates how basic attribute |
| management can be done in a way that is portable to older versions of |
| Python, and explains how the handler functions are called, so that if |
| you do need to extend their functionality, you'll understand what |
| needs to be done. |
| |
| The \member{tp_getattr} handler is called when the object requires an |
| attribute look-up. It is called in the same situations where the |
| \method{__getattr__()} method of a class would be called. |
| |
| A likely way to handle this is (1) to implement a set of functions |
| (such as \cfunction{newdatatype_getSize()} and |
| \cfunction{newdatatype_setSize()} in the example below), (2) provide a |
| method table listing these functions, and (3) provide a getattr |
| function that returns the result of a lookup in that table. The |
| method table uses the same structure as the \member{tp_methods} field |
| of the type object. |
| |
| Here is an example: |
| |
| \begin{verbatim} |
| static PyMethodDef newdatatype_methods[] = { |
| {"getSize", (PyCFunction)newdatatype_getSize, METH_VARARGS, |
| "Return the current size."}, |
| {"setSize", (PyCFunction)newdatatype_setSize, METH_VARARGS, |
| "Set the size."}, |
| {NULL, NULL, 0, NULL} /* sentinel */ |
| }; |
| |
| static PyObject * |
| newdatatype_getattr(newdatatypeobject *obj, char *name) |
| { |
| return Py_FindMethod(newdatatype_methods, (PyObject *)obj, name); |
| } |
| \end{verbatim} |
| |
| The \member{tp_setattr} handler is called when the |
| \method{__setattr__()} or \method{__delattr__()} method of a class |
| instance would be called. When an attribute should be deleted, the |
| third parameter will be \NULL. Here is an example that simply raises |
| an exception; if this were really all you wanted, the |
| \member{tp_setattr} handler should be set to \NULL. |
| |
| \begin{verbatim} |
| static int |
| newdatatype_setattr(newdatatypeobject *obj, char *name, PyObject *v) |
| { |
| (void)PyErr_Format(PyExc_RuntimeError, "Read-only attribute: \%s", name); |
| return -1; |
| } |
| \end{verbatim} |
| |
| |
| \subsection{Object Comparison} |
| |
| \begin{verbatim} |
| cmpfunc tp_compare; |
| \end{verbatim} |
| |
| The \member{tp_compare} handler is called when comparisons are needed |
| and the object does not implement the specific rich comparison method |
| which matches the requested comparison. (It is always used if defined |
| and the \cfunction{PyObject_Compare()} or \cfunction{PyObject_Cmp()} |
| functions are used, or if \function{cmp()} is used from Python.) |
| It is analogous to the \method{__cmp__()} method. This function |
| should return \code{-1} if \var{obj1} is less than |
| \var{obj2}, \code{0} if they are equal, and \code{1} if |
| \var{obj1} is greater than |
| \var{obj2}. |
| (It was previously allowed to return arbitrary negative or positive |
| integers for less than and greater than, respectively; as of Python |
| 2.2, this is no longer allowed. In the future, other return values |
| may be assigned a different meaning.) |
| |
| A \member{tp_compare} handler may raise an exception. In this case it |
| should return a negative value. The caller has to test for the |
| exception using \cfunction{PyErr_Occurred()}. |
| |
| |
| Here is a sample implementation: |
| |
| \begin{verbatim} |
| static int |
| newdatatype_compare(newdatatypeobject * obj1, newdatatypeobject * obj2) |
| { |
| long result; |
| |
| if (obj1->obj_UnderlyingDatatypePtr->size < |
| obj2->obj_UnderlyingDatatypePtr->size) { |
| result = -1; |
| } |
| else if (obj1->obj_UnderlyingDatatypePtr->size > |
| obj2->obj_UnderlyingDatatypePtr->size) { |
| result = 1; |
| } |
| else { |
| result = 0; |
| } |
| return result; |
| } |
| \end{verbatim} |
| |
| |
| \subsection{Abstract Protocol Support} |
| |
| Python supports a variety of \emph{abstract} `protocols;' the specific |
| interfaces provided to use these interfaces are documented in the |
| \citetitle[../api/api.html]{Python/C API Reference Manual} in the |
| chapter ``\ulink{Abstract Objects Layer}{../api/abstract.html}.'' |
| |
| A number of these abstract interfaces were defined early in the |
| development of the Python implementation. In particular, the number, |
| mapping, and sequence protocols have been part of Python since the |
| beginning. Other protocols have been added over time. For protocols |
| which depend on several handler routines from the type implementation, |
| the older protocols have been defined as optional blocks of handlers |
| referenced by the type object. For newer protocols there are |
| additional slots in the main type object, with a flag bit being set to |
| indicate that the slots are present and should be checked by the |
| interpreter. (The flag bit does not indicate that the slot values are |
| non-\NULL. The flag may be set to indicate the presence of a slot, |
| but a slot may still be unfilled.) |
| |
| \begin{verbatim} |
| PyNumberMethods tp_as_number; |
| PySequenceMethods tp_as_sequence; |
| PyMappingMethods tp_as_mapping; |
| \end{verbatim} |
| |
| If you wish your object to be able to act like a number, a sequence, |
| or a mapping object, then you place the address of a structure that |
| implements the C type \ctype{PyNumberMethods}, |
| \ctype{PySequenceMethods}, or \ctype{PyMappingMethods}, respectively. |
| It is up to you to fill in this structure with appropriate values. You |
| can find examples of the use of each of these in the \file{Objects} |
| directory of the Python source distribution. |
| |
| |
| \begin{verbatim} |
| hashfunc tp_hash; |
| \end{verbatim} |
| |
| This function, if you choose to provide it, should return a hash |
| number for an instance of your data type. Here is a moderately |
| pointless example: |
| |
| \begin{verbatim} |
| static long |
| newdatatype_hash(newdatatypeobject *obj) |
| { |
| long result; |
| result = obj->obj_UnderlyingDatatypePtr->size; |
| result = result * 3; |
| return result; |
| } |
| \end{verbatim} |
| |
| \begin{verbatim} |
| ternaryfunc tp_call; |
| \end{verbatim} |
| |
| This function is called when an instance of your data type is "called", |
| for example, if \code{obj1} is an instance of your data type and the Python |
| script contains \code{obj1('hello')}, the \member{tp_call} handler is |
| invoked. |
| |
| This function takes three arguments: |
| |
| \begin{enumerate} |
| \item |
| \var{arg1} is the instance of the data type which is the subject of |
| the call. If the call is \code{obj1('hello')}, then \var{arg1} is |
| \code{obj1}. |
| |
| \item |
| \var{arg2} is a tuple containing the arguments to the call. You |
| can use \cfunction{PyArg_ParseTuple()} to extract the arguments. |
| |
| \item |
| \var{arg3} is a dictionary of keyword arguments that were passed. |
| If this is non-\NULL{} and you support keyword arguments, use |
| \cfunction{PyArg_ParseTupleAndKeywords()} to extract the |
| arguments. If you do not want to support keyword arguments and |
| this is non-\NULL, raise a \exception{TypeError} with a message |
| saying that keyword arguments are not supported. |
| \end{enumerate} |
| |
| Here is a desultory example of the implementation of the call function. |
| |
| \begin{verbatim} |
| /* Implement the call function. |
| * obj1 is the instance receiving the call. |
| * obj2 is a tuple containing the arguments to the call, in this |
| * case 3 strings. |
| */ |
| static PyObject * |
| newdatatype_call(newdatatypeobject *obj, PyObject *args, PyObject *other) |
| { |
| PyObject *result; |
| char *arg1; |
| char *arg2; |
| char *arg3; |
| |
| if (!PyArg_ParseTuple(args, "sss:call", &arg1, &arg2, &arg3)) { |
| return NULL; |
| } |
| result = PyString_FromFormat( |
| "Returning -- value: [\%d] arg1: [\%s] arg2: [\%s] arg3: [\%s]\n", |
| obj->obj_UnderlyingDatatypePtr->size, |
| arg1, arg2, arg3); |
| printf("\%s", PyString_AS_STRING(result)); |
| return result; |
| } |
| \end{verbatim} |
| |
| XXX some fields need to be added here... |
| |
| |
| \begin{verbatim} |
| /* Added in release 2.2 */ |
| /* Iterators */ |
| getiterfunc tp_iter; |
| iternextfunc tp_iternext; |
| \end{verbatim} |
| |
| These functions provide support for the iterator protocol. Any object |
| which wishes to support iteration over its contents (which may be |
| generated during iteration) must implement the \code{tp_iter} |
| handler. Objects which are returned by a \code{tp_iter} handler must |
| implement both the \code{tp_iter} and \code{tp_iternext} handlers. |
| Both handlers take exactly one parameter, the instance for which they |
| are being called, and return a new reference. In the case of an |
| error, they should set an exception and return \NULL. |
| |
| For an object which represents an iterable collection, the |
| \code{tp_iter} handler must return an iterator object. The iterator |
| object is responsible for maintaining the state of the iteration. For |
| collections which can support multiple iterators which do not |
| interfere with each other (as lists and tuples do), a new iterator |
| should be created and returned. Objects which can only be iterated |
| over once (usually due to side effects of iteration) should implement |
| this handler by returning a new reference to themselves, and should |
| also implement the \code{tp_iternext} handler. File objects are an |
| example of such an iterator. |
| |
| Iterator objects should implement both handlers. The \code{tp_iter} |
| handler should return a new reference to the iterator (this is the |
| same as the \code{tp_iter} handler for objects which can only be |
| iterated over destructively). The \code{tp_iternext} handler should |
| return a new reference to the next object in the iteration if there is |
| one. If the iteration has reached the end, it may return \NULL{} |
| without setting an exception or it may set \exception{StopIteration}; |
| avoiding the exception can yield slightly better performance. If an |
| actual error occurs, it should set an exception and return \NULL. |
| |
| |
| \subsection{Weak Reference Support\label{weakref-support}} |
| |
| One of the goals of Python's weak-reference implementation is to allow |
| any type to participate in the weak reference mechanism without |
| incurring the overhead on those objects which do not benefit by weak |
| referencing (such as numbers). |
| |
| For an object to be weakly referencable, the extension must include a |
| \ctype{PyObject*} field in the instance structure for the use of the |
| weak reference mechanism; it must be initialized to \NULL{} by the |
| object's constructor. It must also set the \member{tp_weaklistoffset} |
| field of the corresponding type object to the offset of the field. |
| For example, the instance type is defined with the following |
| structure: |
| |
| \begin{verbatim} |
| typedef struct { |
| PyObject_HEAD |
| PyClassObject *in_class; /* The class object */ |
| PyObject *in_dict; /* A dictionary */ |
| PyObject *in_weakreflist; /* List of weak references */ |
| } PyInstanceObject; |
| \end{verbatim} |
| |
| The statically-declared type object for instances is defined this way: |
| |
| \begin{verbatim} |
| PyTypeObject PyInstance_Type = { |
| PyObject_HEAD_INIT(&PyType_Type) |
| 0, |
| "module.instance", |
| |
| /* Lots of stuff omitted for brevity... */ |
| |
| Py_TPFLAGS_DEFAULT, /* tp_flags */ |
| 0, /* tp_doc */ |
| 0, /* tp_traverse */ |
| 0, /* tp_clear */ |
| 0, /* tp_richcompare */ |
| offsetof(PyInstanceObject, in_weakreflist), /* tp_weaklistoffset */ |
| }; |
| \end{verbatim} |
| |
| The type constructor is responsible for initializing the weak reference |
| list to \NULL: |
| |
| \begin{verbatim} |
| static PyObject * |
| instance_new() { |
| /* Other initialization stuff omitted for brevity */ |
| |
| self->in_weakreflist = NULL; |
| |
| return (PyObject *) self; |
| } |
| \end{verbatim} |
| |
| The only further addition is that the destructor needs to call the |
| weak reference manager to clear any weak references. This should be |
| done before any other parts of the destruction have occurred, but is |
| only required if the weak reference list is non-\NULL: |
| |
| \begin{verbatim} |
| static void |
| instance_dealloc(PyInstanceObject *inst) |
| { |
| /* Allocate temporaries if needed, but do not begin |
| destruction just yet. |
| */ |
| |
| if (inst->in_weakreflist != NULL) |
| PyObject_ClearWeakRefs((PyObject *) inst); |
| |
| /* Proceed with object destruction normally. */ |
| } |
| \end{verbatim} |
| |
| |
| \subsection{More Suggestions} |
| |
| Remember that you can omit most of these functions, in which case you |
| provide \code{0} as a value. There are type definitions for each of |
| the functions you must provide. They are in \file{object.h} in the |
| Python include directory that comes with the source distribution of |
| Python. |
| |
| In order to learn how to implement any specific method for your new |
| data type, do the following: Download and unpack the Python source |
| distribution. Go the \file{Objects} directory, then search the |
| C source files for \code{tp_} plus the function you want (for |
| example, \code{tp_print} or \code{tp_compare}). You will find |
| examples of the function you want to implement. |
| |
| When you need to verify that an object is an instance of the type |
| you are implementing, use the \cfunction{PyObject_TypeCheck} function. |
| A sample of its use might be something like the following: |
| |
| \begin{verbatim} |
| if (! PyObject_TypeCheck(some_object, &MyType)) { |
| PyErr_SetString(PyExc_TypeError, "arg #1 not a mything"); |
| return NULL; |
| } |
| \end{verbatim} |