Antoine Pitrou | 1d80a56 | 2018-04-07 18:14:03 +0200 | [diff] [blame] | 1 | .. highlightlang:: c |
| 2 | |
| 3 | .. _defining-new-types: |
| 4 | |
| 5 | ********************************** |
| 6 | Defining Extension Types: Tutorial |
| 7 | ********************************** |
| 8 | |
| 9 | .. sectionauthor:: Michael Hudson <mwh@python.net> |
| 10 | .. sectionauthor:: Dave Kuhlman <dkuhlman@rexx.com> |
| 11 | .. sectionauthor:: Jim Fulton <jim@zope.com> |
| 12 | |
| 13 | |
| 14 | Python allows the writer of a C extension module to define new types that |
| 15 | can be manipulated from Python code, much like the built-in :class:`str` |
| 16 | and :class:`list` types. The code for all extension types follows a |
| 17 | pattern, but there are some details that you need to understand before you |
| 18 | can get started. This document is a gentle introduction to the topic. |
| 19 | |
| 20 | |
| 21 | .. _dnt-basics: |
| 22 | |
| 23 | The Basics |
| 24 | ========== |
| 25 | |
| 26 | The :term:`CPython` runtime sees all Python objects as variables of type |
| 27 | :c:type:`PyObject\*`, which serves as a "base type" for all Python objects. |
| 28 | The :c:type:`PyObject` structure itself only contains the object's |
| 29 | :term:`reference count` and a pointer to the object's "type object". |
| 30 | This is where the action is; the type object determines which (C) functions |
| 31 | get called by the interpreter when, for instance, an attribute gets looked up |
| 32 | on an object, a method called, or it is multiplied by another object. These |
| 33 | C functions are called "type methods". |
| 34 | |
| 35 | So, if you want to define a new extension type, you need to create a new type |
| 36 | object. |
| 37 | |
| 38 | This sort of thing can only be explained by example, so here's a minimal, but |
| 39 | complete, module that defines a new type named :class:`Custom` inside a C |
| 40 | extension module :mod:`custom`: |
| 41 | |
| 42 | .. note:: |
| 43 | What we're showing here is the traditional way of defining *static* |
| 44 | extension types. It should be adequate for most uses. The C API also |
| 45 | allows defining heap-allocated extension types using the |
| 46 | :c:func:`PyType_FromSpec` function, which isn't covered in this tutorial. |
| 47 | |
| 48 | .. literalinclude:: ../includes/custom.c |
| 49 | |
| 50 | Now that's quite a bit to take in at once, but hopefully bits will seem familiar |
| 51 | from the previous chapter. This file defines three things: |
| 52 | |
| 53 | #. What a :class:`Custom` **object** contains: this is the ``CustomObject`` |
| 54 | struct, which is allocated once for each :class:`Custom` instance. |
| 55 | #. How the :class:`Custom` **type** behaves: this is the ``CustomType`` struct, |
| 56 | which defines a set of flags and function pointers that the interpreter |
| 57 | inspects when specific operations are requested. |
| 58 | #. How to initialize the :mod:`custom` module: this is the ``PyInit_custom`` |
| 59 | function and the associated ``custommodule`` struct. |
| 60 | |
| 61 | The first bit is:: |
| 62 | |
| 63 | typedef struct { |
| 64 | PyObject_HEAD |
| 65 | } CustomObject; |
| 66 | |
| 67 | This is what a Custom object will contain. ``PyObject_HEAD`` is mandatory |
| 68 | at the start of each object struct and defines a field called ``ob_base`` |
| 69 | of type :c:type:`PyObject`, containing a pointer to a type object and a |
| 70 | reference count (these can be accessed using the macros :c:macro:`Py_REFCNT` |
| 71 | and :c:macro:`Py_TYPE` respectively). The reason for the macro is to |
| 72 | abstract away the layout and to enable additional fields in debug builds. |
| 73 | |
| 74 | .. note:: |
| 75 | There is no semicolon above after the :c:macro:`PyObject_HEAD` macro. |
| 76 | Be wary of adding one by accident: some compilers will complain. |
| 77 | |
| 78 | Of course, objects generally store additional data besides the standard |
| 79 | ``PyObject_HEAD`` boilerplate; for example, here is the definition for |
| 80 | standard Python floats:: |
| 81 | |
| 82 | typedef struct { |
| 83 | PyObject_HEAD |
| 84 | double ob_fval; |
| 85 | } PyFloatObject; |
| 86 | |
| 87 | The second bit is the definition of the type object. :: |
| 88 | |
| 89 | static PyTypeObject CustomType = { |
| 90 | PyVarObject_HEAD_INIT(NULL, 0) |
| 91 | .tp_name = "custom.Custom", |
| 92 | .tp_doc = "Custom objects", |
| 93 | .tp_basicsize = sizeof(CustomObject), |
| 94 | .tp_itemsize = 0, |
| 95 | .tp_new = PyType_GenericNew, |
| 96 | }; |
| 97 | |
| 98 | .. note:: |
| 99 | We recommend using C99-style designated initializers as above, to |
| 100 | avoid listing all the :c:type:`PyTypeObject` fields that you don't care |
| 101 | about and also to avoid caring about the fields' declaration order. |
| 102 | |
| 103 | The actual definition of :c:type:`PyTypeObject` in :file:`object.h` has |
| 104 | many more :ref:`fields <type-structs>` than the definition above. The |
| 105 | remaining fields will be filled with zeros by the C compiler, and it's |
| 106 | common practice to not specify them explicitly unless you need them. |
| 107 | |
| 108 | We're going to pick it apart, one field at a time:: |
| 109 | |
| 110 | PyVarObject_HEAD_INIT(NULL, 0) |
| 111 | |
| 112 | This line is mandatory boilerplate to initialize the ``ob_base`` |
| 113 | field mentioned above. :: |
| 114 | |
| 115 | .tp_name = "custom.Custom", |
| 116 | |
| 117 | The name of our type. This will appear in the default textual representation of |
| 118 | our objects and in some error messages, for example: |
| 119 | |
Serhiy Storchaka | 46936d5 | 2018-04-08 19:18:04 +0300 | [diff] [blame] | 120 | .. code-block:: pycon |
Antoine Pitrou | 1d80a56 | 2018-04-07 18:14:03 +0200 | [diff] [blame] | 121 | |
| 122 | >>> "" + custom.Custom() |
| 123 | Traceback (most recent call last): |
| 124 | File "<stdin>", line 1, in <module> |
| 125 | TypeError: can only concatenate str (not "custom.Custom") to str |
| 126 | |
| 127 | Note that the name is a dotted name that includes both the module name and the |
| 128 | name of the type within the module. The module in this case is :mod:`custom` and |
| 129 | the type is :class:`Custom`, so we set the type name to :class:`custom.Custom`. |
| 130 | Using the real dotted import path is important to make your type compatible |
| 131 | with the :mod:`pydoc` and :mod:`pickle` modules. :: |
| 132 | |
| 133 | .tp_basicsize = sizeof(CustomObject), |
| 134 | .tp_itemsize = 0, |
| 135 | |
| 136 | This is so that Python knows how much memory to allocate when creating |
| 137 | new :class:`Custom` instances. :c:member:`~PyTypeObject.tp_itemsize` is |
| 138 | only used for variable-sized objects and should otherwise be zero. |
| 139 | |
| 140 | .. note:: |
| 141 | |
| 142 | If you want your type to be subclassable from Python, and your type has the same |
| 143 | :c:member:`~PyTypeObject.tp_basicsize` as its base type, you may have problems with multiple |
| 144 | inheritance. A Python subclass of your type will have to list your type first |
| 145 | in its :attr:`~class.__bases__`, or else it will not be able to call your type's |
| 146 | :meth:`__new__` method without getting an error. You can avoid this problem by |
| 147 | ensuring that your type has a larger value for :c:member:`~PyTypeObject.tp_basicsize` than its |
| 148 | base type does. Most of the time, this will be true anyway, because either your |
| 149 | base type will be :class:`object`, or else you will be adding data members to |
| 150 | your base type, and therefore increasing its size. |
| 151 | |
| 152 | We set the class flags to :const:`Py_TPFLAGS_DEFAULT`. :: |
| 153 | |
| 154 | .tp_flags = Py_TPFLAGS_DEFAULT, |
| 155 | |
| 156 | All types should include this constant in their flags. It enables all of the |
| 157 | members defined until at least Python 3.3. If you need further members, |
| 158 | you will need to OR the corresponding flags. |
| 159 | |
| 160 | We provide a doc string for the type in :c:member:`~PyTypeObject.tp_doc`. :: |
| 161 | |
| 162 | .tp_doc = "Custom objects", |
| 163 | |
| 164 | To enable object creation, we have to provide a :c:member:`~PyTypeObject.tp_new` |
| 165 | handler. This is the equivalent of the Python method :meth:`__new__`, but |
| 166 | has to be specified explicitly. In this case, we can just use the default |
| 167 | implementation provided by the API function :c:func:`PyType_GenericNew`. :: |
| 168 | |
| 169 | .tp_new = PyType_GenericNew, |
| 170 | |
| 171 | Everything else in the file should be familiar, except for some code in |
| 172 | :c:func:`PyInit_custom`:: |
| 173 | |
| 174 | if (PyType_Ready(&CustomType) < 0) |
| 175 | return; |
| 176 | |
| 177 | This initializes the :class:`Custom` type, filling in a number of members |
| 178 | to the appropriate default values, including :attr:`ob_type` that we initially |
| 179 | set to *NULL*. :: |
| 180 | |
| 181 | PyModule_AddObject(m, "Custom", (PyObject *) &CustomType); |
| 182 | |
| 183 | This adds the type to the module dictionary. This allows us to create |
| 184 | :class:`Custom` instances by calling the :class:`Custom` class: |
| 185 | |
Serhiy Storchaka | 46936d5 | 2018-04-08 19:18:04 +0300 | [diff] [blame] | 186 | .. code-block:: pycon |
Antoine Pitrou | 1d80a56 | 2018-04-07 18:14:03 +0200 | [diff] [blame] | 187 | |
| 188 | >>> import custom |
| 189 | >>> mycustom = custom.Custom() |
| 190 | |
| 191 | That's it! All that remains is to build it; put the above code in a file called |
| 192 | :file:`custom.c` and: |
| 193 | |
| 194 | .. code-block:: python |
| 195 | |
| 196 | from distutils.core import setup, Extension |
| 197 | setup(name="custom", version="1.0", |
| 198 | ext_modules=[Extension("custom", ["custom.c"])]) |
| 199 | |
| 200 | in a file called :file:`setup.py`; then typing |
| 201 | |
| 202 | .. code-block:: shell-session |
| 203 | |
| 204 | $ python setup.py build |
| 205 | |
| 206 | at a shell should produce a file :file:`custom.so` in a subdirectory; move to |
| 207 | that directory and fire up Python --- you should be able to ``import custom`` and |
| 208 | play around with Custom objects. |
| 209 | |
| 210 | That wasn't so hard, was it? |
| 211 | |
| 212 | Of course, the current Custom type is pretty uninteresting. It has no data and |
| 213 | doesn't do anything. It can't even be subclassed. |
| 214 | |
| 215 | .. note:: |
| 216 | While this documentation showcases the standard :mod:`distutils` module |
| 217 | for building C extensions, it is recommended in real-world use cases to |
| 218 | use the newer and better-maintained ``setuptools`` library. Documentation |
| 219 | on how to do this is out of scope for this document and can be found in |
| 220 | the `Python Packaging User's Guide <https://packaging.python.org/tutorials/distributing-packages/>`_. |
| 221 | |
| 222 | |
| 223 | Adding data and methods to the Basic example |
| 224 | ============================================ |
| 225 | |
| 226 | Let's extend the basic example to add some data and methods. Let's also make |
| 227 | the type usable as a base class. We'll create a new module, :mod:`custom2` that |
| 228 | adds these capabilities: |
| 229 | |
| 230 | .. literalinclude:: ../includes/custom2.c |
| 231 | |
| 232 | |
| 233 | This version of the module has a number of changes. |
| 234 | |
| 235 | We've added an extra include:: |
| 236 | |
| 237 | #include <structmember.h> |
| 238 | |
| 239 | This include provides declarations that we use to handle attributes, as |
| 240 | described a bit later. |
| 241 | |
| 242 | The :class:`Custom` type now has three data attributes in its C struct, |
| 243 | *first*, *last*, and *number*. The *first* and *last* variables are Python |
| 244 | strings containing first and last names. The *number* attribute is a C integer. |
| 245 | |
| 246 | The object structure is updated accordingly:: |
| 247 | |
| 248 | typedef struct { |
| 249 | PyObject_HEAD |
| 250 | PyObject *first; /* first name */ |
| 251 | PyObject *last; /* last name */ |
| 252 | int number; |
| 253 | } CustomObject; |
| 254 | |
| 255 | Because we now have data to manage, we have to be more careful about object |
| 256 | allocation and deallocation. At a minimum, we need a deallocation method:: |
| 257 | |
| 258 | static void |
| 259 | Custom_dealloc(CustomObject *self) |
| 260 | { |
| 261 | Py_XDECREF(self->first); |
| 262 | Py_XDECREF(self->last); |
| 263 | Py_TYPE(self)->tp_free((PyObject *) self); |
| 264 | } |
| 265 | |
| 266 | which is assigned to the :c:member:`~PyTypeObject.tp_dealloc` member:: |
| 267 | |
| 268 | .tp_dealloc = (destructor) Custom_dealloc, |
| 269 | |
| 270 | This method first clears the reference counts of the two Python attributes. |
| 271 | :c:func:`Py_XDECREF` correctly handles the case where its argument is |
| 272 | *NULL* (which might happen here if ``tp_new`` failed midway). It then |
| 273 | calls the :c:member:`~PyTypeObject.tp_free` member of the object's type |
| 274 | (computed by ``Py_TYPE(self)``) to free the object's memory. Note that |
| 275 | the object's type might not be :class:`CustomType`, because the object may |
| 276 | be an instance of a subclass. |
| 277 | |
| 278 | .. note:: |
| 279 | The explicit cast to ``destructor`` above is needed because we defined |
| 280 | ``Custom_dealloc`` to take a ``CustomObject *`` argument, but the ``tp_dealloc`` |
| 281 | function pointer expects to receive a ``PyObject *`` argument. Otherwise, |
| 282 | the compiler will emit a warning. This is object-oriented polymorphism, |
| 283 | in C! |
| 284 | |
| 285 | We want to make sure that the first and last names are initialized to empty |
| 286 | strings, so we provide a ``tp_new`` implementation:: |
| 287 | |
| 288 | static PyObject * |
| 289 | Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds) |
| 290 | { |
| 291 | CustomObject *self; |
| 292 | self = (CustomObject *) type->tp_alloc(type, 0); |
| 293 | if (self != NULL) { |
| 294 | self->first = PyUnicode_FromString(""); |
| 295 | if (self->first == NULL) { |
| 296 | Py_DECREF(self); |
| 297 | return NULL; |
| 298 | } |
| 299 | self->last = PyUnicode_FromString(""); |
| 300 | if (self->last == NULL) { |
| 301 | Py_DECREF(self); |
| 302 | return NULL; |
| 303 | } |
| 304 | self->number = 0; |
| 305 | } |
| 306 | return (PyObject *) self; |
| 307 | } |
| 308 | |
| 309 | and install it in the :c:member:`~PyTypeObject.tp_new` member:: |
| 310 | |
| 311 | .tp_new = Custom_new, |
| 312 | |
| 313 | The ``tp_new`` handler is responsible for creating (as opposed to initializing) |
| 314 | objects of the type. It is exposed in Python as the :meth:`__new__` method. |
| 315 | It is not required to define a ``tp_new`` member, and indeed many extension |
| 316 | types will simply reuse :c:func:`PyType_GenericNew` as done in the first |
| 317 | version of the ``Custom`` type above. In this case, we use the ``tp_new`` |
| 318 | handler to initialize the ``first`` and ``last`` attributes to non-*NULL* |
| 319 | default values. |
| 320 | |
| 321 | ``tp_new`` is passed the type being instantiated (not necessarily ``CustomType``, |
| 322 | if a subclass is instantiated) and any arguments passed when the type was |
| 323 | called, and is expected to return the instance created. ``tp_new`` handlers |
| 324 | always accept positional and keyword arguments, but they often ignore the |
| 325 | arguments, leaving the argument handling to initializer (a.k.a. ``tp_init`` |
| 326 | in C or ``__init__`` in Python) methods. |
| 327 | |
| 328 | .. note:: |
| 329 | ``tp_new`` shouldn't call ``tp_init`` explicitly, as the interpreter |
| 330 | will do it itself. |
| 331 | |
| 332 | The ``tp_new`` implementation calls the :c:member:`~PyTypeObject.tp_alloc` |
| 333 | slot to allocate memory:: |
| 334 | |
| 335 | self = (CustomObject *) type->tp_alloc(type, 0); |
| 336 | |
| 337 | Since memory allocation may fail, we must check the :c:member:`~PyTypeObject.tp_alloc` |
| 338 | result against *NULL* before proceeding. |
| 339 | |
| 340 | .. note:: |
| 341 | We didn't fill the :c:member:`~PyTypeObject.tp_alloc` slot ourselves. Rather |
| 342 | :c:func:`PyType_Ready` fills it for us by inheriting it from our base class, |
| 343 | which is :class:`object` by default. Most types use the default allocation |
| 344 | strategy. |
| 345 | |
| 346 | .. note:: |
| 347 | If you are creating a co-operative :c:member:`~PyTypeObject.tp_new` (one |
| 348 | that calls a base type's :c:member:`~PyTypeObject.tp_new` or :meth:`__new__`), |
| 349 | you must *not* try to determine what method to call using method resolution |
| 350 | order at runtime. Always statically determine what type you are going to |
| 351 | call, and call its :c:member:`~PyTypeObject.tp_new` directly, or via |
| 352 | ``type->tp_base->tp_new``. If you do not do this, Python subclasses of your |
| 353 | type that also inherit from other Python-defined classes may not work correctly. |
| 354 | (Specifically, you may not be able to create instances of such subclasses |
| 355 | without getting a :exc:`TypeError`.) |
| 356 | |
| 357 | We also define an initialization function which accepts arguments to provide |
| 358 | initial values for our instance:: |
| 359 | |
| 360 | static int |
| 361 | Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) |
| 362 | { |
| 363 | static char *kwlist[] = {"first", "last", "number", NULL}; |
| 364 | PyObject *first = NULL, *last = NULL, *tmp; |
| 365 | |
| 366 | if (!PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist, |
| 367 | &first, &last, |
| 368 | &self->number)) |
| 369 | return -1; |
| 370 | |
| 371 | if (first) { |
| 372 | tmp = self->first; |
| 373 | Py_INCREF(first); |
| 374 | self->first = first; |
| 375 | Py_XDECREF(tmp); |
| 376 | } |
| 377 | if (last) { |
| 378 | tmp = self->last; |
| 379 | Py_INCREF(last); |
| 380 | self->last = last; |
| 381 | Py_XDECREF(tmp); |
| 382 | } |
| 383 | return 0; |
| 384 | } |
| 385 | |
| 386 | by filling the :c:member:`~PyTypeObject.tp_init` slot. :: |
| 387 | |
| 388 | .tp_init = (initproc) Custom_init, |
| 389 | |
| 390 | The :c:member:`~PyTypeObject.tp_init` slot is exposed in Python as the |
| 391 | :meth:`__init__` method. It is used to initialize an object after it's |
| 392 | created. Initializers always accept positional and keyword arguments, |
| 393 | and they should return either ``0`` on success or ``-1`` on error. |
| 394 | |
| 395 | Unlike the ``tp_new`` handler, there is no guarantee that ``tp_init`` |
| 396 | is called at all (for example, the :mod:`pickle` module by default |
| 397 | doesn't call :meth:`__init__` on unpickled instances). It can also be |
| 398 | called multiple times. Anyone can call the :meth:`__init__` method on |
| 399 | our objects. For this reason, we have to be extra careful when assigning |
| 400 | the new attribute values. We might be tempted, for example to assign the |
| 401 | ``first`` member like this:: |
| 402 | |
| 403 | if (first) { |
| 404 | Py_XDECREF(self->first); |
| 405 | Py_INCREF(first); |
| 406 | self->first = first; |
| 407 | } |
| 408 | |
| 409 | But this would be risky. Our type doesn't restrict the type of the |
| 410 | ``first`` member, so it could be any kind of object. It could have a |
| 411 | destructor that causes code to be executed that tries to access the |
| 412 | ``first`` member; or that destructor could release the |
| 413 | :term:`Global interpreter Lock` and let arbitrary code run in other |
| 414 | threads that accesses and modifies our object. |
| 415 | |
| 416 | To be paranoid and protect ourselves against this possibility, we almost |
| 417 | always reassign members before decrementing their reference counts. When |
| 418 | don't we have to do this? |
| 419 | |
| 420 | * when we absolutely know that the reference count is greater than 1; |
| 421 | |
| 422 | * when we know that deallocation of the object [#]_ will neither release |
| 423 | the :term:`GIL` nor cause any calls back into our type's code; |
| 424 | |
| 425 | * when decrementing a reference count in a :c:member:`~PyTypeObject.tp_dealloc` |
| 426 | handler on a type which doesn't support cyclic garbage collection [#]_. |
| 427 | |
| 428 | We want to expose our instance variables as attributes. There are a |
| 429 | number of ways to do that. The simplest way is to define member definitions:: |
| 430 | |
| 431 | static PyMemberDef Custom_members[] = { |
| 432 | {"first", T_OBJECT_EX, offsetof(CustomObject, first), 0, |
| 433 | "first name"}, |
| 434 | {"last", T_OBJECT_EX, offsetof(CustomObject, last), 0, |
| 435 | "last name"}, |
| 436 | {"number", T_INT, offsetof(CustomObject, number), 0, |
| 437 | "custom number"}, |
| 438 | {NULL} /* Sentinel */ |
| 439 | }; |
| 440 | |
| 441 | and put the definitions in the :c:member:`~PyTypeObject.tp_members` slot:: |
| 442 | |
| 443 | .tp_members = Custom_members, |
| 444 | |
| 445 | Each member definition has a member name, type, offset, access flags and |
| 446 | documentation string. See the :ref:`Generic-Attribute-Management` section |
| 447 | below for details. |
| 448 | |
| 449 | A disadvantage of this approach is that it doesn't provide a way to restrict the |
| 450 | types of objects that can be assigned to the Python attributes. We expect the |
| 451 | first and last names to be strings, but any Python objects can be assigned. |
| 452 | Further, the attributes can be deleted, setting the C pointers to *NULL*. Even |
| 453 | though we can make sure the members are initialized to non-*NULL* values, the |
| 454 | members can be set to *NULL* if the attributes are deleted. |
| 455 | |
| 456 | We define a single method, :meth:`Custom.name()`, that outputs the objects name as the |
| 457 | concatenation of the first and last names. :: |
| 458 | |
| 459 | static PyObject * |
Siddhesh Poyarekar | 55edd0c | 2018-04-30 00:29:33 +0530 | [diff] [blame] | 460 | Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored)) |
Antoine Pitrou | 1d80a56 | 2018-04-07 18:14:03 +0200 | [diff] [blame] | 461 | { |
| 462 | if (self->first == NULL) { |
| 463 | PyErr_SetString(PyExc_AttributeError, "first"); |
| 464 | return NULL; |
| 465 | } |
| 466 | if (self->last == NULL) { |
| 467 | PyErr_SetString(PyExc_AttributeError, "last"); |
| 468 | return NULL; |
| 469 | } |
| 470 | return PyUnicode_FromFormat("%S %S", self->first, self->last); |
| 471 | } |
| 472 | |
| 473 | The method is implemented as a C function that takes a :class:`Custom` (or |
| 474 | :class:`Custom` subclass) instance as the first argument. Methods always take an |
| 475 | instance as the first argument. Methods often take positional and keyword |
| 476 | arguments as well, but in this case we don't take any and don't need to accept |
| 477 | a positional argument tuple or keyword argument dictionary. This method is |
| 478 | equivalent to the Python method: |
| 479 | |
| 480 | .. code-block:: python |
| 481 | |
| 482 | def name(self): |
| 483 | return "%s %s" % (self.first, self.last) |
| 484 | |
| 485 | Note that we have to check for the possibility that our :attr:`first` and |
| 486 | :attr:`last` members are *NULL*. This is because they can be deleted, in which |
| 487 | case they are set to *NULL*. It would be better to prevent deletion of these |
| 488 | attributes and to restrict the attribute values to be strings. We'll see how to |
| 489 | do that in the next section. |
| 490 | |
| 491 | Now that we've defined the method, we need to create an array of method |
| 492 | definitions:: |
| 493 | |
| 494 | static PyMethodDef Custom_methods[] = { |
| 495 | {"name", (PyCFunction) Custom_name, METH_NOARGS, |
| 496 | "Return the name, combining the first and last name" |
| 497 | }, |
| 498 | {NULL} /* Sentinel */ |
| 499 | }; |
| 500 | |
| 501 | (note that we used the :const:`METH_NOARGS` flag to indicate that the method |
| 502 | is expecting no arguments other than *self*) |
| 503 | |
| 504 | and assign it to the :c:member:`~PyTypeObject.tp_methods` slot:: |
| 505 | |
| 506 | .tp_methods = Custom_methods, |
| 507 | |
| 508 | Finally, we'll make our type usable as a base class for subclassing. We've |
| 509 | written our methods carefully so far so that they don't make any assumptions |
| 510 | about the type of the object being created or used, so all we need to do is |
| 511 | to add the :const:`Py_TPFLAGS_BASETYPE` to our class flag definition:: |
| 512 | |
| 513 | .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, |
| 514 | |
| 515 | We rename :c:func:`PyInit_custom` to :c:func:`PyInit_custom2`, update the |
| 516 | module name in the :c:type:`PyModuleDef` struct, and update the full class |
| 517 | name in the :c:type:`PyTypeObject` struct. |
| 518 | |
| 519 | Finally, we update our :file:`setup.py` file to build the new module: |
| 520 | |
| 521 | .. code-block:: python |
| 522 | |
| 523 | from distutils.core import setup, Extension |
| 524 | setup(name="custom", version="1.0", |
| 525 | ext_modules=[ |
| 526 | Extension("custom", ["custom.c"]), |
| 527 | Extension("custom2", ["custom2.c"]), |
| 528 | ]) |
| 529 | |
| 530 | |
| 531 | Providing finer control over data attributes |
| 532 | ============================================ |
| 533 | |
| 534 | In this section, we'll provide finer control over how the :attr:`first` and |
| 535 | :attr:`last` attributes are set in the :class:`Custom` example. In the previous |
| 536 | version of our module, the instance variables :attr:`first` and :attr:`last` |
| 537 | could be set to non-string values or even deleted. We want to make sure that |
| 538 | these attributes always contain strings. |
| 539 | |
| 540 | .. literalinclude:: ../includes/custom3.c |
| 541 | |
| 542 | |
| 543 | To provide greater control, over the :attr:`first` and :attr:`last` attributes, |
| 544 | we'll use custom getter and setter functions. Here are the functions for |
| 545 | getting and setting the :attr:`first` attribute:: |
| 546 | |
| 547 | static PyObject * |
| 548 | Custom_getfirst(CustomObject *self, void *closure) |
| 549 | { |
| 550 | Py_INCREF(self->first); |
| 551 | return self->first; |
| 552 | } |
| 553 | |
| 554 | static int |
| 555 | Custom_setfirst(CustomObject *self, PyObject *value, void *closure) |
| 556 | { |
| 557 | PyObject *tmp; |
| 558 | if (value == NULL) { |
| 559 | PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute"); |
| 560 | return -1; |
| 561 | } |
| 562 | if (!PyUnicode_Check(value)) { |
| 563 | PyErr_SetString(PyExc_TypeError, |
| 564 | "The first attribute value must be a string"); |
| 565 | return -1; |
| 566 | } |
| 567 | tmp = self->first; |
| 568 | Py_INCREF(value); |
| 569 | self->first = value; |
| 570 | Py_DECREF(tmp); |
| 571 | return 0; |
| 572 | } |
| 573 | |
| 574 | The getter function is passed a :class:`Custom` object and a "closure", which is |
| 575 | a void pointer. In this case, the closure is ignored. (The closure supports an |
| 576 | advanced usage in which definition data is passed to the getter and setter. This |
| 577 | could, for example, be used to allow a single set of getter and setter functions |
| 578 | that decide the attribute to get or set based on data in the closure.) |
| 579 | |
| 580 | The setter function is passed the :class:`Custom` object, the new value, and the |
| 581 | closure. The new value may be *NULL*, in which case the attribute is being |
| 582 | deleted. In our setter, we raise an error if the attribute is deleted or if its |
| 583 | new value is not a string. |
| 584 | |
| 585 | We create an array of :c:type:`PyGetSetDef` structures:: |
| 586 | |
| 587 | static PyGetSetDef Custom_getsetters[] = { |
| 588 | {"first", (getter) Custom_getfirst, (setter) Custom_setfirst, |
| 589 | "first name", NULL}, |
| 590 | {"last", (getter) Custom_getlast, (setter) Custom_setlast, |
| 591 | "last name", NULL}, |
| 592 | {NULL} /* Sentinel */ |
| 593 | }; |
| 594 | |
| 595 | and register it in the :c:member:`~PyTypeObject.tp_getset` slot:: |
| 596 | |
| 597 | .tp_getset = Custom_getsetters, |
| 598 | |
| 599 | The last item in a :c:type:`PyGetSetDef` structure is the "closure" mentioned |
| 600 | above. In this case, we aren't using a closure, so we just pass *NULL*. |
| 601 | |
| 602 | We also remove the member definitions for these attributes:: |
| 603 | |
| 604 | static PyMemberDef Custom_members[] = { |
| 605 | {"number", T_INT, offsetof(CustomObject, number), 0, |
| 606 | "custom number"}, |
| 607 | {NULL} /* Sentinel */ |
| 608 | }; |
| 609 | |
| 610 | We also need to update the :c:member:`~PyTypeObject.tp_init` handler to only |
| 611 | allow strings [#]_ to be passed:: |
| 612 | |
| 613 | static int |
| 614 | Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) |
| 615 | { |
| 616 | static char *kwlist[] = {"first", "last", "number", NULL}; |
| 617 | PyObject *first = NULL, *last = NULL, *tmp; |
| 618 | |
| 619 | if (!PyArg_ParseTupleAndKeywords(args, kwds, "|UUi", kwlist, |
| 620 | &first, &last, |
| 621 | &self->number)) |
| 622 | return -1; |
| 623 | |
| 624 | if (first) { |
| 625 | tmp = self->first; |
| 626 | Py_INCREF(first); |
| 627 | self->first = first; |
| 628 | Py_DECREF(tmp); |
| 629 | } |
| 630 | if (last) { |
| 631 | tmp = self->last; |
| 632 | Py_INCREF(last); |
| 633 | self->last = last; |
| 634 | Py_DECREF(tmp); |
| 635 | } |
| 636 | return 0; |
| 637 | } |
| 638 | |
| 639 | With these changes, we can assure that the ``first`` and ``last`` members are |
| 640 | never *NULL* so we can remove checks for *NULL* values in almost all cases. |
| 641 | This means that most of the :c:func:`Py_XDECREF` calls can be converted to |
| 642 | :c:func:`Py_DECREF` calls. The only place we can't change these calls is in |
| 643 | the ``tp_dealloc`` implementation, where there is the possibility that the |
| 644 | initialization of these members failed in ``tp_new``. |
| 645 | |
| 646 | We also rename the module initialization function and module name in the |
| 647 | initialization function, as we did before, and we add an extra definition to the |
| 648 | :file:`setup.py` file. |
| 649 | |
| 650 | |
| 651 | Supporting cyclic garbage collection |
| 652 | ==================================== |
| 653 | |
| 654 | Python has a :term:`cyclic garbage collector (GC) <garbage collection>` that |
| 655 | can identify unneeded objects even when their reference counts are not zero. |
| 656 | This can happen when objects are involved in cycles. For example, consider: |
| 657 | |
Serhiy Storchaka | 46936d5 | 2018-04-08 19:18:04 +0300 | [diff] [blame] | 658 | .. code-block:: pycon |
Antoine Pitrou | 1d80a56 | 2018-04-07 18:14:03 +0200 | [diff] [blame] | 659 | |
| 660 | >>> l = [] |
| 661 | >>> l.append(l) |
| 662 | >>> del l |
| 663 | |
| 664 | In this example, we create a list that contains itself. When we delete it, it |
| 665 | still has a reference from itself. Its reference count doesn't drop to zero. |
| 666 | Fortunately, Python's cyclic garbage collector will eventually figure out that |
| 667 | the list is garbage and free it. |
| 668 | |
| 669 | In the second version of the :class:`Custom` example, we allowed any kind of |
| 670 | object to be stored in the :attr:`first` or :attr:`last` attributes [#]_. |
| 671 | Besides, in the second and third versions, we allowed subclassing |
| 672 | :class:`Custom`, and subclasses may add arbitrary attributes. For any of |
| 673 | those two reasons, :class:`Custom` objects can participate in cycles: |
| 674 | |
Serhiy Storchaka | 46936d5 | 2018-04-08 19:18:04 +0300 | [diff] [blame] | 675 | .. code-block:: pycon |
Antoine Pitrou | 1d80a56 | 2018-04-07 18:14:03 +0200 | [diff] [blame] | 676 | |
| 677 | >>> import custom3 |
| 678 | >>> class Derived(custom3.Custom): pass |
| 679 | ... |
| 680 | >>> n = Derived() |
| 681 | >>> n.some_attribute = n |
| 682 | |
| 683 | To allow a :class:`Custom` instance participating in a reference cycle to |
| 684 | be properly detected and collected by the cyclic GC, our :class:`Custom` type |
| 685 | needs to fill two additional slots and to enable a flag that enables these slots: |
| 686 | |
| 687 | .. literalinclude:: ../includes/custom4.c |
| 688 | |
| 689 | |
| 690 | First, the traversal method lets the cyclic GC know about subobjects that could |
| 691 | participate in cycles:: |
| 692 | |
| 693 | static int |
| 694 | Custom_traverse(CustomObject *self, visitproc visit, void *arg) |
| 695 | { |
| 696 | int vret; |
| 697 | if (self->first) { |
| 698 | vret = visit(self->first, arg); |
| 699 | if (vret != 0) |
| 700 | return vret; |
| 701 | } |
| 702 | if (self->last) { |
| 703 | vret = visit(self->last, arg); |
| 704 | if (vret != 0) |
| 705 | return vret; |
| 706 | } |
| 707 | return 0; |
| 708 | } |
| 709 | |
| 710 | For each subobject that can participate in cycles, we need to call the |
| 711 | :c:func:`visit` function, which is passed to the traversal method. The |
| 712 | :c:func:`visit` function takes as arguments the subobject and the extra argument |
| 713 | *arg* passed to the traversal method. It returns an integer value that must be |
| 714 | returned if it is non-zero. |
| 715 | |
| 716 | Python provides a :c:func:`Py_VISIT` macro that automates calling visit |
| 717 | functions. With :c:func:`Py_VISIT`, we can minimize the amount of boilerplate |
| 718 | in ``Custom_traverse``:: |
| 719 | |
| 720 | static int |
| 721 | Custom_traverse(CustomObject *self, visitproc visit, void *arg) |
| 722 | { |
| 723 | Py_VISIT(self->first); |
| 724 | Py_VISIT(self->last); |
| 725 | return 0; |
| 726 | } |
| 727 | |
| 728 | .. note:: |
| 729 | The :c:member:`~PyTypeObject.tp_traverse` implementation must name its |
| 730 | arguments exactly *visit* and *arg* in order to use :c:func:`Py_VISIT`. |
| 731 | |
| 732 | Second, we need to provide a method for clearing any subobjects that can |
| 733 | participate in cycles:: |
| 734 | |
| 735 | static int |
| 736 | Custom_clear(CustomObject *self) |
| 737 | { |
| 738 | Py_CLEAR(self->first); |
| 739 | Py_CLEAR(self->last); |
| 740 | return 0; |
| 741 | } |
| 742 | |
| 743 | Notice the use of the :c:func:`Py_CLEAR` macro. It is the recommended and safe |
| 744 | way to clear data attributes of arbitrary types while decrementing |
| 745 | their reference counts. If you were to call :c:func:`Py_XDECREF` instead |
| 746 | on the attribute before setting it to *NULL*, there is a possibility |
| 747 | that the attribute's destructor would call back into code that reads the |
| 748 | attribute again (*especially* if there is a reference cycle). |
| 749 | |
| 750 | .. note:: |
| 751 | You could emulate :c:func:`Py_CLEAR` by writing:: |
| 752 | |
| 753 | PyObject *tmp; |
| 754 | tmp = self->first; |
| 755 | self->first = NULL; |
| 756 | Py_XDECREF(tmp); |
| 757 | |
| 758 | Nevertheless, it is much easier and less error-prone to always |
| 759 | use :c:func:`Py_CLEAR` when deleting an attribute. Don't |
| 760 | try to micro-optimize at the expense of robustness! |
| 761 | |
| 762 | The deallocator ``Custom_dealloc`` may call arbitrary code when clearing |
| 763 | attributes. It means the circular GC can be triggered inside the function. |
| 764 | Since the GC assumes reference count is not zero, we need to untrack the object |
| 765 | from the GC by calling :c:func:`PyObject_GC_UnTrack` before clearing members. |
| 766 | Here is our reimplemented deallocator using :c:func:`PyObject_GC_UnTrack` |
| 767 | and ``Custom_clear``:: |
| 768 | |
| 769 | static void |
| 770 | Custom_dealloc(CustomObject *self) |
| 771 | { |
| 772 | PyObject_GC_UnTrack(self); |
| 773 | Custom_clear(self); |
| 774 | Py_TYPE(self)->tp_free((PyObject *) self); |
| 775 | } |
| 776 | |
| 777 | Finally, we add the :const:`Py_TPFLAGS_HAVE_GC` flag to the class flags:: |
| 778 | |
| 779 | .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, |
| 780 | |
| 781 | That's pretty much it. If we had written custom :c:member:`~PyTypeObject.tp_alloc` or |
| 782 | :c:member:`~PyTypeObject.tp_free` handlers, we'd need to modify them for cyclic |
| 783 | garbage collection. Most extensions will use the versions automatically provided. |
| 784 | |
| 785 | |
| 786 | Subclassing other types |
| 787 | ======================= |
| 788 | |
| 789 | It is possible to create new extension types that are derived from existing |
| 790 | types. It is easiest to inherit from the built in types, since an extension can |
| 791 | easily use the :c:type:`PyTypeObject` it needs. It can be difficult to share |
| 792 | these :c:type:`PyTypeObject` structures between extension modules. |
| 793 | |
| 794 | In this example we will create a :class:`SubList` type that inherits from the |
| 795 | built-in :class:`list` type. The new type will be completely compatible with |
| 796 | regular lists, but will have an additional :meth:`increment` method that |
| 797 | increases an internal counter: |
| 798 | |
Serhiy Storchaka | 46936d5 | 2018-04-08 19:18:04 +0300 | [diff] [blame] | 799 | .. code-block:: pycon |
Antoine Pitrou | 1d80a56 | 2018-04-07 18:14:03 +0200 | [diff] [blame] | 800 | |
| 801 | >>> import sublist |
| 802 | >>> s = sublist.SubList(range(3)) |
| 803 | >>> s.extend(s) |
| 804 | >>> print(len(s)) |
| 805 | 6 |
| 806 | >>> print(s.increment()) |
| 807 | 1 |
| 808 | >>> print(s.increment()) |
| 809 | 2 |
| 810 | |
| 811 | .. literalinclude:: ../includes/sublist.c |
| 812 | |
| 813 | |
| 814 | As you can see, the source code closely resembles the :class:`Custom` examples in |
| 815 | previous sections. We will break down the main differences between them. :: |
| 816 | |
| 817 | typedef struct { |
| 818 | PyListObject list; |
| 819 | int state; |
| 820 | } SubListObject; |
| 821 | |
| 822 | The primary difference for derived type objects is that the base type's |
| 823 | object structure must be the first value. The base type will already include |
| 824 | the :c:func:`PyObject_HEAD` at the beginning of its structure. |
| 825 | |
| 826 | When a Python object is a :class:`SubList` instance, its ``PyObject *`` pointer |
| 827 | can be safely cast to both ``PyListObject *`` and ``SubListObject *``:: |
| 828 | |
| 829 | static int |
| 830 | SubList_init(SubListObject *self, PyObject *args, PyObject *kwds) |
| 831 | { |
| 832 | if (PyList_Type.tp_init((PyObject *) self, args, kwds) < 0) |
| 833 | return -1; |
| 834 | self->state = 0; |
| 835 | return 0; |
| 836 | } |
| 837 | |
| 838 | We see above how to call through to the :attr:`__init__` method of the base |
| 839 | type. |
| 840 | |
| 841 | This pattern is important when writing a type with custom |
| 842 | :c:member:`~PyTypeObject.tp_new` and :c:member:`~PyTypeObject.tp_dealloc` |
| 843 | members. The :c:member:`~PyTypeObject.tp_new` handler should not actually |
| 844 | create the memory for the object with its :c:member:`~PyTypeObject.tp_alloc`, |
| 845 | but let the base class handle it by calling its own :c:member:`~PyTypeObject.tp_new`. |
| 846 | |
| 847 | The :c:type:`PyTypeObject` struct supports a :c:member:`~PyTypeObject.tp_base` |
| 848 | specifying the type's concrete base class. Due to cross-platform compiler |
| 849 | issues, you can't fill that field directly with a reference to |
| 850 | :c:type:`PyList_Type`; it should be done later in the module initialization |
| 851 | function:: |
| 852 | |
| 853 | PyMODINIT_FUNC |
| 854 | PyInit_sublist(void) |
| 855 | { |
| 856 | PyObject* m; |
| 857 | SubListType.tp_base = &PyList_Type; |
| 858 | if (PyType_Ready(&SubListType) < 0) |
| 859 | return NULL; |
| 860 | |
| 861 | m = PyModule_Create(&sublistmodule); |
| 862 | if (m == NULL) |
| 863 | return NULL; |
| 864 | |
| 865 | Py_INCREF(&SubListType); |
| 866 | PyModule_AddObject(m, "SubList", (PyObject *) &SubListType); |
| 867 | return m; |
| 868 | } |
| 869 | |
| 870 | Before calling :c:func:`PyType_Ready`, the type structure must have the |
| 871 | :c:member:`~PyTypeObject.tp_base` slot filled in. When we are deriving an |
| 872 | existing type, it is not necessary to fill out the :c:member:`~PyTypeObject.tp_alloc` |
| 873 | slot with :c:func:`PyType_GenericNew` -- the allocation function from the base |
| 874 | type will be inherited. |
| 875 | |
| 876 | After that, calling :c:func:`PyType_Ready` and adding the type object to the |
| 877 | module is the same as with the basic :class:`Custom` examples. |
| 878 | |
| 879 | |
| 880 | .. rubric:: Footnotes |
| 881 | |
| 882 | .. [#] This is true when we know that the object is a basic type, like a string or a |
| 883 | float. |
| 884 | |
| 885 | .. [#] We relied on this in the :c:member:`~PyTypeObject.tp_dealloc` handler |
| 886 | in this example, because our type doesn't support garbage collection. |
| 887 | |
| 888 | .. [#] We now know that the first and last members are strings, so perhaps we |
| 889 | could be less careful about decrementing their reference counts, however, |
| 890 | we accept instances of string subclasses. Even though deallocating normal |
| 891 | strings won't call back into our objects, we can't guarantee that deallocating |
| 892 | an instance of a string subclass won't call back into our objects. |
| 893 | |
| 894 | .. [#] Also, even with our attributes restricted to strings instances, the user |
| 895 | could pass arbitrary :class:`str` subclasses and therefore still create |
| 896 | reference cycles. |