Christian Heimes | 23790b4 | 2013-03-26 17:53:05 +0100 | [diff] [blame] | 1 | .. _xml: |
| 2 | |
| 3 | XML Processing Modules |
| 4 | ====================== |
| 5 | |
| 6 | .. module:: xml |
| 7 | :synopsis: Package containing XML processing modules |
| 8 | .. sectionauthor:: Christian Heimes <christian@python.org> |
| 9 | .. sectionauthor:: Georg Brandl <georg@python.org> |
| 10 | |
| 11 | |
| 12 | Python's interfaces for processing XML are grouped in the ``xml`` package. |
| 13 | |
| 14 | .. warning:: |
| 15 | |
| 16 | The XML modules are not secure against erroneous or maliciously |
| 17 | constructed data. If you need to parse untrusted or unauthenticated data see |
| 18 | :ref:`xml-vulnerabilities`. |
| 19 | |
| 20 | It is important to note that modules in the :mod:`xml` package require that |
| 21 | there be at least one SAX-compliant XML parser available. The Expat parser is |
| 22 | included with Python, so the :mod:`xml.parsers.expat` module will always be |
| 23 | available. |
| 24 | |
| 25 | The documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the |
| 26 | definition of the Python bindings for the DOM and SAX interfaces. |
| 27 | |
| 28 | The XML handling submodules are: |
| 29 | |
| 30 | * :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight |
| 31 | |
| 32 | .. |
| 33 | |
| 34 | * :mod:`xml.dom`: the DOM API definition |
Antoine Pitrou | c96592d | 2013-12-22 01:57:01 +0100 | [diff] [blame] | 35 | * :mod:`xml.dom.minidom`: a minimal DOM implementation |
Christian Heimes | 23790b4 | 2013-03-26 17:53:05 +0100 | [diff] [blame] | 36 | * :mod:`xml.dom.pulldom`: support for building partial DOM trees |
| 37 | |
| 38 | .. |
| 39 | |
| 40 | * :mod:`xml.sax`: SAX2 base classes and convenience functions |
| 41 | * :mod:`xml.parsers.expat`: the Expat parser binding |
| 42 | |
| 43 | |
| 44 | .. _xml-vulnerabilities: |
| 45 | |
| 46 | XML vulnerabilities |
| 47 | =================== |
| 48 | |
| 49 | The XML processing modules are not secure against maliciously constructed data. |
| 50 | An attacker can abuse vulnerabilities for e.g. denial of service attacks, to |
| 51 | access local files, to generate network connections to other machines, or |
| 52 | to or circumvent firewalls. The attacks on XML abuse unfamiliar features |
| 53 | like inline `DTD`_ (document type definition) with entities. |
| 54 | |
Georg Brandl | c2a9dc3 | 2013-10-12 18:19:33 +0200 | [diff] [blame] | 55 | The following table gives an overview of the known attacks and if the various |
| 56 | modules are vulnerable to them. |
Christian Heimes | 23790b4 | 2013-03-26 17:53:05 +0100 | [diff] [blame] | 57 | |
| 58 | ========================= ======== ========= ========= ======== ========= |
| 59 | kind sax etree minidom pulldom xmlrpc |
| 60 | ========================= ======== ========= ========= ======== ========= |
Georg Brandl | c2a9dc3 | 2013-10-12 18:19:33 +0200 | [diff] [blame] | 61 | billion laughs **Yes** **Yes** **Yes** **Yes** **Yes** |
| 62 | quadratic blowup **Yes** **Yes** **Yes** **Yes** **Yes** |
| 63 | external entity expansion **Yes** No (1) No (2) **Yes** No (3) |
| 64 | DTD retrieval **Yes** No No **Yes** No |
| 65 | decompression bomb No No No No **Yes** |
Christian Heimes | 23790b4 | 2013-03-26 17:53:05 +0100 | [diff] [blame] | 66 | ========================= ======== ========= ========= ======== ========= |
| 67 | |
| 68 | 1. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a |
| 69 | ParserError when an entity occurs. |
| 70 | 2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns |
| 71 | the unexpanded entity verbatim. |
| 72 | 3. :mod:`xmlrpclib` doesn't expand external entities and omits them. |
| 73 | |
| 74 | |
| 75 | billion laughs / exponential entity expansion |
| 76 | The `Billion Laughs`_ attack -- also known as exponential entity expansion -- |
| 77 | uses multiple levels of nested entities. Each entity refers to another entity |
| 78 | several times, the final entity definition contains a small string. Eventually |
| 79 | the small string is expanded to several gigabytes. The exponential expansion |
| 80 | consumes lots of CPU time, too. |
| 81 | |
| 82 | quadratic blowup entity expansion |
| 83 | A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses |
| 84 | entity expansion, too. Instead of nested entities it repeats one large entity |
| 85 | with a couple of thousand chars over and over again. The attack isn't as |
| 86 | efficient as the exponential case but it avoids triggering countermeasures of |
| 87 | parsers against heavily nested entities. |
| 88 | |
| 89 | external entity expansion |
| 90 | Entity declarations can contain more than just text for replacement. They can |
| 91 | also point to external resources by public identifiers or system identifiers. |
| 92 | System identifiers are standard URIs or can refer to local files. The XML |
| 93 | parser retrieves the resource with e.g. HTTP or FTP requests and embeds the |
| 94 | content into the XML document. |
| 95 | |
| 96 | DTD retrieval |
R David Murray | dd1c4fd | 2014-01-13 13:54:54 -0500 | [diff] [blame^] | 97 | Some XML libraries like Python's :mod:`xml.dom.pulldom` retrieve document type |
Christian Heimes | 23790b4 | 2013-03-26 17:53:05 +0100 | [diff] [blame] | 98 | definitions from remote or local locations. The feature has similar |
| 99 | implications as the external entity expansion issue. |
| 100 | |
| 101 | decompression bomb |
| 102 | The issue of decompression bombs (aka `ZIP bomb`_) apply to all XML libraries |
| 103 | that can parse compressed XML stream like gzipped HTTP streams or LZMA-ed |
| 104 | files. For an attacker it can reduce the amount of transmitted data by three |
| 105 | magnitudes or more. |
| 106 | |
| 107 | The documentation of `defusedxml`_ on PyPI has further information about |
| 108 | all known attack vectors with examples and references. |
| 109 | |
| 110 | defused packages |
| 111 | ---------------- |
| 112 | |
Gregory P. Smith | da76aa8 | 2013-03-30 01:38:38 -0700 | [diff] [blame] | 113 | These external packages are recommended for any code that parses |
| 114 | untrusted XML data. |
| 115 | |
Christian Heimes | 23790b4 | 2013-03-26 17:53:05 +0100 | [diff] [blame] | 116 | `defusedxml`_ is a pure Python package with modified subclasses of all stdlib |
Gregory P. Smith | da76aa8 | 2013-03-30 01:38:38 -0700 | [diff] [blame] | 117 | XML parsers that prevent any potentially malicious operation. The |
| 118 | package also ships with example exploits and extended documentation on more |
Christian Heimes | 23790b4 | 2013-03-26 17:53:05 +0100 | [diff] [blame] | 119 | XML exploits like xpath injection. |
| 120 | |
Gregory P. Smith | da76aa8 | 2013-03-30 01:38:38 -0700 | [diff] [blame] | 121 | `defusedexpat`_ provides a modified libexpat and patched replacement |
Christian Heimes | 23790b4 | 2013-03-26 17:53:05 +0100 | [diff] [blame] | 122 | :mod:`pyexpat` extension module with countermeasures against entity expansion |
| 123 | DoS attacks. Defusedexpat still allows a sane and configurable amount of entity |
| 124 | expansions. The modifications will be merged into future releases of Python. |
| 125 | |
| 126 | The workarounds and modifications are not included in patch releases as they |
| 127 | break backward compatibility. After all inline DTD and entity expansion are |
Gregory P. Smith | da76aa8 | 2013-03-30 01:38:38 -0700 | [diff] [blame] | 128 | well-defined XML features. |
Christian Heimes | 23790b4 | 2013-03-26 17:53:05 +0100 | [diff] [blame] | 129 | |
| 130 | |
Christian Heimes | 75207ab | 2013-03-28 11:42:49 +0100 | [diff] [blame] | 131 | .. _defusedxml: https://pypi.python.org/pypi/defusedxml/ |
| 132 | .. _defusedexpat: https://pypi.python.org/pypi/defusedexpat/ |
Christian Heimes | 23790b4 | 2013-03-26 17:53:05 +0100 | [diff] [blame] | 133 | .. _Billion Laughs: http://en.wikipedia.org/wiki/Billion_laughs |
| 134 | .. _ZIP bomb: http://en.wikipedia.org/wiki/Zip_bomb |
| 135 | .. _DTD: http://en.wikipedia.org/wiki/Document_Type_Definition |