| .. _xml: |
| |
| XML Processing Modules |
| ====================== |
| |
| .. module:: xml |
| :synopsis: Package containing XML processing modules |
| |
| .. sectionauthor:: Christian Heimes <christian@python.org> |
| .. sectionauthor:: Georg Brandl <georg@python.org> |
| |
| **Source code:** :source:`Lib/xml/` |
| |
| -------------- |
| |
| Python's interfaces for processing XML are grouped in the ``xml`` package. |
| |
| .. warning:: |
| |
| The XML modules are not secure against erroneous or maliciously |
| constructed data. If you need to parse untrusted or |
| unauthenticated data see the :ref:`xml-vulnerabilities` and |
| :ref:`defused-packages` sections. |
| |
| It is important to note that modules in the :mod:`xml` package require that |
| there be at least one SAX-compliant XML parser available. The Expat parser is |
| included with Python, so the :mod:`xml.parsers.expat` module will always be |
| available. |
| |
| The documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the |
| definition of the Python bindings for the DOM and SAX interfaces. |
| |
| The XML handling submodules are: |
| |
| * :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight |
| XML processor |
| |
| .. |
| |
| * :mod:`xml.dom`: the DOM API definition |
| * :mod:`xml.dom.minidom`: a minimal DOM implementation |
| * :mod:`xml.dom.pulldom`: support for building partial DOM trees |
| |
| .. |
| |
| * :mod:`xml.sax`: SAX2 base classes and convenience functions |
| * :mod:`xml.parsers.expat`: the Expat parser binding |
| |
| |
| .. _xml-vulnerabilities: |
| |
| XML vulnerabilities |
| ------------------- |
| |
| The XML processing modules are not secure against maliciously constructed data. |
| An attacker can abuse XML features to carry out denial of service attacks, |
| access local files, generate network connections to other machines, or |
| circumvent firewalls. |
| |
| The following table gives an overview of the known attacks and whether |
| the various modules are vulnerable to them. |
| |
| ========================= ======== ========= ========= ======== ========= |
| kind sax etree minidom pulldom xmlrpc |
| ========================= ======== ========= ========= ======== ========= |
| billion laughs **Yes** **Yes** **Yes** **Yes** **Yes** |
| quadratic blowup **Yes** **Yes** **Yes** **Yes** **Yes** |
| external entity expansion **Yes** No (1) No (2) **Yes** No (3) |
| `DTD`_ retrieval **Yes** No No **Yes** No |
| decompression bomb No No No No **Yes** |
| ========================= ======== ========= ========= ======== ========= |
| |
| 1. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a |
| :exc:`ParserError` when an entity occurs. |
| 2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns |
| the unexpanded entity verbatim. |
| 3. :mod:`xmlrpclib` doesn't expand external entities and omits them. |
| |
| |
| billion laughs / exponential entity expansion |
| The `Billion Laughs`_ attack -- also known as exponential entity expansion -- |
| uses multiple levels of nested entities. Each entity refers to another entity |
| several times, and the final entity definition contains a small string. |
| The exponential expansion results in several gigabytes of text and |
| consumes lots of memory and CPU time. |
| |
| quadratic blowup entity expansion |
| A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses |
| entity expansion, too. Instead of nested entities it repeats one large entity |
| with a couple of thousand chars over and over again. The attack isn't as |
| efficient as the exponential case but it avoids triggering parser countermeasures |
| that forbid deeply-nested entities. |
| |
| external entity expansion |
| Entity declarations can contain more than just text for replacement. They can |
| also point to external resources or local files. The XML |
| parser accesses the resource and embeds the content into the XML document. |
| |
| `DTD`_ retrieval |
| Some XML libraries like Python's :mod:`xml.dom.pulldom` retrieve document type |
| definitions from remote or local locations. The feature has similar |
| implications as the external entity expansion issue. |
| |
| decompression bomb |
| Decompression bombs (aka `ZIP bomb`_) apply to all XML libraries |
| that can parse compressed XML streams such as gzipped HTTP streams or |
| LZMA-compressed |
| files. For an attacker it can reduce the amount of transmitted data by three |
| magnitudes or more. |
| |
| The documentation for `defusedxml`_ on PyPI has further information about |
| all known attack vectors with examples and references. |
| |
| .. _defused-packages: |
| |
| The :mod:`defusedxml` and :mod:`defusedexpat` Packages |
| ------------------------------------------------------ |
| |
| `defusedxml`_ is a pure Python package with modified subclasses of all stdlib |
| XML parsers that prevent any potentially malicious operation. Use of this |
| package is recommended for any server code that parses untrusted XML data. The |
| package also ships with example exploits and extended documentation on more |
| XML exploits such as XPath injection. |
| |
| `defusedexpat`_ provides a modified libexpat and a patched |
| :mod:`pyexpat` module that have countermeasures against entity expansion |
| DoS attacks. The :mod:`defusedexpat` module still allows a sane and configurable amount of entity |
| expansions. The modifications may be included in some future release of Python, |
| but will not be included in any bugfix releases of |
| Python because they break backward compatibility. |
| |
| |
| .. _defusedxml: https://pypi.python.org/pypi/defusedxml/ |
| .. _defusedexpat: https://pypi.python.org/pypi/defusedexpat/ |
| .. _Billion Laughs: https://en.wikipedia.org/wiki/Billion_laughs |
| .. _ZIP bomb: https://en.wikipedia.org/wiki/Zip_bomb |
| .. _DTD: https://en.wikipedia.org/wiki/Document_type_definition |