Georg Brandl | fe7b00f | 2012-10-06 13:49:34 +0200 | [diff] [blame] | 1 | .. _xml: |
| 2 | |
| 3 | XML Processing Modules |
| 4 | ====================== |
| 5 | |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 6 | .. module:: xml |
| 7 | :synopsis: Package containing XML processing modules |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 8 | |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 9 | .. sectionauthor:: Christian Heimes <christian@python.org> |
| 10 | .. sectionauthor:: Georg Brandl <georg@python.org> |
| 11 | |
Terry Jan Reedy | fa089b9 | 2016-06-11 15:02:54 -0400 | [diff] [blame] | 12 | **Source code:** :source:`Lib/xml/` |
| 13 | |
| 14 | -------------- |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 15 | |
Georg Brandl | fe7b00f | 2012-10-06 13:49:34 +0200 | [diff] [blame] | 16 | Python's interfaces for processing XML are grouped in the ``xml`` package. |
| 17 | |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 18 | .. warning:: |
| 19 | |
| 20 | The XML modules are not secure against erroneous or maliciously |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 21 | constructed data. If you need to parse untrusted or |
| 22 | unauthenticated data see the :ref:`xml-vulnerabilities` and |
| 23 | :ref:`defused-packages` sections. |
Christian Heimes | 768f6a5 | 2013-03-26 17:47:23 +0100 | [diff] [blame] | 24 | |
Georg Brandl | fe7b00f | 2012-10-06 13:49:34 +0200 | [diff] [blame] | 25 | It is important to note that modules in the :mod:`xml` package require that |
| 26 | there be at least one SAX-compliant XML parser available. The Expat parser is |
| 27 | included with Python, so the :mod:`xml.parsers.expat` module will always be |
| 28 | available. |
| 29 | |
| 30 | The documentation for the :mod:`xml.dom` and :mod:`xml.sax` packages are the |
| 31 | definition of the Python bindings for the DOM and SAX interfaces. |
| 32 | |
| 33 | The XML handling submodules are: |
| 34 | |
| 35 | * :mod:`xml.etree.ElementTree`: the ElementTree API, a simple and lightweight |
Zachary Ware | 19c1f3d | 2014-01-31 11:30:36 -0600 | [diff] [blame] | 36 | XML processor |
Georg Brandl | fe7b00f | 2012-10-06 13:49:34 +0200 | [diff] [blame] | 37 | |
| 38 | .. |
| 39 | |
| 40 | * :mod:`xml.dom`: the DOM API definition |
Antoine Pitrou | f20ea13 | 2013-12-22 01:57:01 +0100 | [diff] [blame] | 41 | * :mod:`xml.dom.minidom`: a minimal DOM implementation |
Georg Brandl | fe7b00f | 2012-10-06 13:49:34 +0200 | [diff] [blame] | 42 | * :mod:`xml.dom.pulldom`: support for building partial DOM trees |
| 43 | |
| 44 | .. |
| 45 | |
| 46 | * :mod:`xml.sax`: SAX2 base classes and convenience functions |
| 47 | * :mod:`xml.parsers.expat`: the Expat parser binding |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 48 | |
| 49 | |
| 50 | .. _xml-vulnerabilities: |
| 51 | |
| 52 | XML vulnerabilities |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 53 | ------------------- |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 54 | |
| 55 | The XML processing modules are not secure against maliciously constructed data. |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 56 | An attacker can abuse XML features to carry out denial of service attacks, |
| 57 | access local files, generate network connections to other machines, or |
| 58 | circumvent firewalls. |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 59 | |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 60 | The following table gives an overview of the known attacks and whether |
| 61 | the various modules are vulnerable to them. |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 62 | |
Guido van Rossum | e1478e4 | 2016-10-13 14:31:50 -0700 | [diff] [blame] | 63 | ========================= ============== =============== ============== ============== ============== |
| 64 | kind sax etree minidom pulldom xmlrpc |
| 65 | ========================= ============== =============== ============== ============== ============== |
| 66 | billion laughs **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable** |
| 67 | quadratic blowup **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable** **Vulnerable** |
| 68 | external entity expansion **Vulnerable** Safe (1) Safe (2) **Vulnerable** Safe (3) |
| 69 | `DTD`_ retrieval **Vulnerable** Safe Safe **Vulnerable** Safe |
| 70 | decompression bomb Safe Safe Safe Safe **Vulnerable** |
| 71 | ========================= ============== =============== ============== ============== ============== |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 72 | |
| 73 | 1. :mod:`xml.etree.ElementTree` doesn't expand external entities and raises a |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 74 | :exc:`ParserError` when an entity occurs. |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 75 | 2. :mod:`xml.dom.minidom` doesn't expand external entities and simply returns |
| 76 | the unexpanded entity verbatim. |
| 77 | 3. :mod:`xmlrpclib` doesn't expand external entities and omits them. |
| 78 | |
| 79 | |
| 80 | billion laughs / exponential entity expansion |
| 81 | The `Billion Laughs`_ attack -- also known as exponential entity expansion -- |
| 82 | uses multiple levels of nested entities. Each entity refers to another entity |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 83 | several times, and the final entity definition contains a small string. |
| 84 | The exponential expansion results in several gigabytes of text and |
| 85 | consumes lots of memory and CPU time. |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 86 | |
| 87 | quadratic blowup entity expansion |
| 88 | A quadratic blowup attack is similar to a `Billion Laughs`_ attack; it abuses |
| 89 | entity expansion, too. Instead of nested entities it repeats one large entity |
| 90 | with a couple of thousand chars over and over again. The attack isn't as |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 91 | efficient as the exponential case but it avoids triggering parser countermeasures |
| 92 | that forbid deeply-nested entities. |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 93 | |
| 94 | external entity expansion |
| 95 | Entity declarations can contain more than just text for replacement. They can |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 96 | also point to external resources or local files. The XML |
| 97 | parser accesses the resource and embeds the content into the XML document. |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 98 | |
Georg Brandl | 5d94134 | 2016-02-26 19:37:12 +0100 | [diff] [blame] | 99 | `DTD`_ retrieval |
R David Murray | 66c9350 | 2014-01-13 13:51:17 -0500 | [diff] [blame] | 100 | Some XML libraries like Python's :mod:`xml.dom.pulldom` retrieve document type |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 101 | definitions from remote or local locations. The feature has similar |
| 102 | implications as the external entity expansion issue. |
| 103 | |
| 104 | decompression bomb |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 105 | Decompression bombs (aka `ZIP bomb`_) apply to all XML libraries |
| 106 | that can parse compressed XML streams such as gzipped HTTP streams or |
| 107 | LZMA-compressed |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 108 | files. For an attacker it can reduce the amount of transmitted data by three |
| 109 | magnitudes or more. |
| 110 | |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 111 | The documentation for `defusedxml`_ on PyPI has further information about |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 112 | all known attack vectors with examples and references. |
| 113 | |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 114 | .. _defused-packages: |
| 115 | |
| 116 | The :mod:`defusedxml` and :mod:`defusedexpat` Packages |
| 117 | ------------------------------------------------------ |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 118 | |
| 119 | `defusedxml`_ is a pure Python package with modified subclasses of all stdlib |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 120 | XML parsers that prevent any potentially malicious operation. Use of this |
| 121 | package is recommended for any server code that parses untrusted XML data. The |
| 122 | package also ships with example exploits and extended documentation on more |
| 123 | XML exploits such as XPath injection. |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 124 | |
Andrew Kuchling | 4da9ab0 | 2014-02-15 15:33:44 -0500 | [diff] [blame] | 125 | `defusedexpat`_ provides a modified libexpat and a patched |
| 126 | :mod:`pyexpat` module that have countermeasures against entity expansion |
| 127 | DoS attacks. The :mod:`defusedexpat` module still allows a sane and configurable amount of entity |
| 128 | expansions. The modifications may be included in some future release of Python, |
| 129 | but will not be included in any bugfix releases of |
| 130 | Python because they break backward compatibility. |
Christian Heimes | 7380a67 | 2013-03-26 17:35:55 +0100 | [diff] [blame] | 131 | |
| 132 | |
Georg Brandl | 6ba6b13 | 2013-03-28 09:11:44 +0100 | [diff] [blame] | 133 | .. _defusedxml: https://pypi.python.org/pypi/defusedxml/ |
| 134 | .. _defusedexpat: https://pypi.python.org/pypi/defusedexpat/ |
Georg Brandl | 5d94134 | 2016-02-26 19:37:12 +0100 | [diff] [blame] | 135 | .. _Billion Laughs: https://en.wikipedia.org/wiki/Billion_laughs |
| 136 | .. _ZIP bomb: https://en.wikipedia.org/wiki/Zip_bomb |
| 137 | .. _DTD: https://en.wikipedia.org/wiki/Document_type_definition |