| Fred Drake | 3c50ea4 | 2008-05-17 22:02:32 +0000 | [diff] [blame] | 1 | :mod:`html.parser` --- Simple HTML and XHTML parser | 
 | 2 | =================================================== | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 3 |  | 
| Fred Drake | 3c50ea4 | 2008-05-17 22:02:32 +0000 | [diff] [blame] | 4 | .. module:: html.parser | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 5 |    :synopsis: A simple parser that can handle HTML and XHTML. | 
 | 6 |  | 
 | 7 |  | 
| Georg Brandl | 9087b7f | 2008-05-18 07:53:01 +0000 | [diff] [blame] | 8 | .. index:: | 
 | 9 |    single: HTML | 
 | 10 |    single: XHTML | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 11 |  | 
| Raymond Hettinger | a199368 | 2011-01-27 01:20:32 +0000 | [diff] [blame] | 12 | **Source code:** :source:`Lib/html/parser.py` | 
 | 13 |  | 
 | 14 | -------------- | 
 | 15 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 16 | This module defines a class :class:`HTMLParser` which serves as the basis for | 
 | 17 | parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 18 |  | 
| R. David Murray | b579dba | 2010-12-03 04:06:39 +0000 | [diff] [blame] | 19 | .. class:: HTMLParser(strict=True) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 20 |  | 
| R. David Murray | b579dba | 2010-12-03 04:06:39 +0000 | [diff] [blame] | 21 |    Create a parser instance.  If *strict* is ``True`` (the default), invalid | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 22 |    HTML results in :exc:`~html.parser.HTMLParseError` exceptions [#]_.  If | 
| R. David Murray | b579dba | 2010-12-03 04:06:39 +0000 | [diff] [blame] | 23 |    *strict* is ``False``, the parser uses heuristics to make a best guess at | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 24 |    the intention of any invalid HTML it encounters, similar to the way most | 
 | 25 |    browsers do.  Using ``strict=False`` is advised. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 26 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 27 |    An :class:`.HTMLParser` instance is fed HTML data and calls handler methods | 
 | 28 |    when start tags, end tags, text, comments, and other markup elements are | 
 | 29 |    encountered.  The user should subclass :class:`.HTMLParser` and override its | 
 | 30 |    methods to implement the desired behavior. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 31 |  | 
| Georg Brandl | 877b10a | 2008-06-01 21:25:55 +0000 | [diff] [blame] | 32 |    This parser does not check that end tags match start tags or call the end-tag | 
 | 33 |    handler for elements which are closed implicitly by closing an outer element. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 34 |  | 
| R. David Murray | bb7b753 | 2010-12-03 04:26:18 +0000 | [diff] [blame] | 35 |    .. versionchanged:: 3.2 *strict* keyword added | 
 | 36 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 37 | An exception is defined as well: | 
 | 38 |  | 
 | 39 |  | 
 | 40 | .. exception:: HTMLParseError | 
 | 41 |  | 
 | 42 |    Exception raised by the :class:`HTMLParser` class when it encounters an error | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 43 |    while parsing and *strict* is ``True``.  This exception provides three | 
 | 44 |    attributes: :attr:`msg` is a brief message explaining the error, | 
 | 45 |    :attr:`lineno` is the number of the line on which the broken construct was | 
 | 46 |    detected, and :attr:`offset` is the number of characters into the line at | 
 | 47 |    which the construct starts. | 
 | 48 |  | 
 | 49 |  | 
 | 50 | Example HTML Parser Application | 
 | 51 | ------------------------------- | 
 | 52 |  | 
 | 53 | As a basic example, below is a simple HTML parser that uses the | 
 | 54 | :class:`HTMLParser` class to print out start tags, end tags, and data | 
 | 55 | as they are encountered:: | 
 | 56 |  | 
 | 57 |    from html.parser import HTMLParser | 
 | 58 |  | 
 | 59 |    class MyHTMLParser(HTMLParser): | 
 | 60 |        def handle_starttag(self, tag, attrs): | 
 | 61 |            print("Encountered a start tag:", tag) | 
 | 62 |        def handle_endtag(self, tag): | 
 | 63 |            print("Encountered an end tag :", tag) | 
 | 64 |        def handle_data(self, data): | 
 | 65 |            print("Encountered some data  :", data) | 
 | 66 |  | 
 | 67 |    parser = MyHTMLParser(strict=False) | 
 | 68 |    parser.feed('<html><head><title>Test</title></head>' | 
 | 69 |                '<body><h1>Parse me!</h1></body></html>') | 
 | 70 |  | 
 | 71 | The output will then be:: | 
 | 72 |  | 
 | 73 |    Encountered a start tag: html | 
 | 74 |    Encountered a start tag: head | 
 | 75 |    Encountered a start tag: title | 
 | 76 |    Encountered some data  : Test | 
 | 77 |    Encountered an end tag : title | 
 | 78 |    Encountered an end tag : head | 
 | 79 |    Encountered a start tag: body | 
 | 80 |    Encountered a start tag: h1 | 
 | 81 |    Encountered some data  : Parse me! | 
 | 82 |    Encountered an end tag : h1 | 
 | 83 |    Encountered an end tag : body | 
 | 84 |    Encountered an end tag : html | 
 | 85 |  | 
 | 86 |  | 
 | 87 | :class:`.HTMLParser` Methods | 
 | 88 | ---------------------------- | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 89 |  | 
 | 90 | :class:`HTMLParser` instances have the following methods: | 
 | 91 |  | 
 | 92 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 93 | .. method:: HTMLParser.feed(data) | 
 | 94 |  | 
 | 95 |    Feed some text to the parser.  It is processed insofar as it consists of | 
 | 96 |    complete elements; incomplete data is buffered until more data is fed or | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 97 |    :meth:`close` is called.  *data* must be :class:`str`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 98 |  | 
 | 99 |  | 
 | 100 | .. method:: HTMLParser.close() | 
 | 101 |  | 
 | 102 |    Force processing of all buffered data as if it were followed by an end-of-file | 
 | 103 |    mark.  This method may be redefined by a derived class to define additional | 
 | 104 |    processing at the end of the input, but the redefined version should always call | 
 | 105 |    the :class:`HTMLParser` base class method :meth:`close`. | 
 | 106 |  | 
 | 107 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 108 | .. method:: HTMLParser.reset() | 
 | 109 |  | 
 | 110 |    Reset the instance.  Loses all unprocessed data.  This is called implicitly at | 
 | 111 |    instantiation time. | 
 | 112 |  | 
 | 113 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 114 | .. method:: HTMLParser.getpos() | 
 | 115 |  | 
 | 116 |    Return current line number and offset. | 
 | 117 |  | 
 | 118 |  | 
 | 119 | .. method:: HTMLParser.get_starttag_text() | 
 | 120 |  | 
 | 121 |    Return the text of the most recently opened start tag.  This should not normally | 
 | 122 |    be needed for structured processing, but may be useful in dealing with HTML "as | 
 | 123 |    deployed" or for re-generating input with minimal changes (whitespace between | 
 | 124 |    attributes can be preserved, etc.). | 
 | 125 |  | 
 | 126 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 127 | The following methods are called when data or markup elements are encountered | 
 | 128 | and they are meant to be overridden in a subclass.  The base class | 
 | 129 | implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`): | 
 | 130 |  | 
 | 131 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 132 | .. method:: HTMLParser.handle_starttag(tag, attrs) | 
 | 133 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 134 |    This method is called to handle the start of a tag (e.g. ``<div id="main">``). | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 135 |  | 
 | 136 |    The *tag* argument is the name of the tag converted to lower case. The *attrs* | 
 | 137 |    argument is a list of ``(name, value)`` pairs containing the attributes found | 
 | 138 |    inside the tag's ``<>`` brackets.  The *name* will be translated to lower case, | 
 | 139 |    and quotes in the *value* have been removed, and character and entity references | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 140 |    have been replaced. | 
 | 141 |  | 
 | 142 |    For instance, for the tag ``<A HREF="http://www.cwi.nl/">``, this method | 
 | 143 |    would be called as ``handle_starttag('a', [('href', 'http://www.cwi.nl/')])``. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 144 |  | 
| Georg Brandl | 9087b7f | 2008-05-18 07:53:01 +0000 | [diff] [blame] | 145 |    All entity references from :mod:`html.entities` are replaced in the attribute | 
 | 146 |    values. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 147 |  | 
 | 148 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 149 | .. method:: HTMLParser.handle_endtag(tag) | 
 | 150 |  | 
 | 151 |    This method is called to handle the end tag of an element (e.g. ``</div>``). | 
 | 152 |  | 
 | 153 |    The *tag* argument is the name of the tag converted to lower case. | 
 | 154 |  | 
 | 155 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 156 | .. method:: HTMLParser.handle_startendtag(tag, attrs) | 
 | 157 |  | 
 | 158 |    Similar to :meth:`handle_starttag`, but called when the parser encounters an | 
| Ezio Melotti | f99e4b5 | 2011-10-28 14:34:56 +0300 | [diff] [blame] | 159 |    XHTML-style empty tag (``<img ... />``).  This method may be overridden by | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 160 |    subclasses which require this particular lexical information; the default | 
| Ezio Melotti | f99e4b5 | 2011-10-28 14:34:56 +0300 | [diff] [blame] | 161 |    implementation simply calls :meth:`handle_starttag` and :meth:`handle_endtag`. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 162 |  | 
 | 163 |  | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 164 | .. method:: HTMLParser.handle_data(data) | 
 | 165 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 166 |    This method is called to process arbitrary data (e.g. text nodes and the | 
 | 167 |    content of ``<script>...</script>`` and ``<style>...</style>``). | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 168 |  | 
 | 169 |  | 
 | 170 | .. method:: HTMLParser.handle_entityref(name) | 
 | 171 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 172 |    This method is called to process a named character reference of the form | 
 | 173 |    ``&name;`` (e.g. ``>``), where *name* is a general entity reference | 
 | 174 |    (e.g. ``'gt'``). | 
 | 175 |  | 
 | 176 |  | 
 | 177 | .. method:: HTMLParser.handle_charref(name) | 
 | 178 |  | 
 | 179 |    This method is called to process decimal and hexadecimal numeric character | 
 | 180 |    references of the form ``&#NNN;`` and ``&#xNNN;``.  For example, the decimal | 
 | 181 |    equivalent for ``>`` is ``>``, whereas the hexadecimal is ``>``; | 
 | 182 |    in this case the method will receive ``'62'`` or ``'x3E'``. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 183 |  | 
 | 184 |  | 
 | 185 | .. method:: HTMLParser.handle_comment(data) | 
 | 186 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 187 |    This method is called when a comment is encountered (e.g. ``<!--comment-->``). | 
 | 188 |  | 
 | 189 |    For example, the comment ``<!-- comment -->`` will cause this method to be | 
 | 190 |    called with the argument ``' comment '``. | 
 | 191 |  | 
 | 192 |    The content of Internet Explorer conditional comments (condcoms) will also be | 
 | 193 |    sent to this method, so, for ``<!--[if IE 9]>IE9-specific content<![endif]-->``, | 
 | 194 |    this method will receive ``'[if IE 9]>IE-specific content<![endif]'``. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 195 |  | 
 | 196 |  | 
 | 197 | .. method:: HTMLParser.handle_decl(decl) | 
 | 198 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 199 |    This method is called to handle an HTML doctype declaration (e.g. | 
 | 200 |    ``<!DOCTYPE html>``). | 
 | 201 |  | 
| Georg Brandl | 46aa5c5 | 2010-07-29 13:38:37 +0000 | [diff] [blame] | 202 |    The *decl* parameter will be the entire contents of the declaration inside | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 203 |    the ``<!...>`` markup (e.g. ``'DOCTYPE html'``). | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 204 |  | 
 | 205 |  | 
 | 206 | .. method:: HTMLParser.handle_pi(data) | 
 | 207 |  | 
 | 208 |    Method called when a processing instruction is encountered.  The *data* | 
 | 209 |    parameter will contain the entire processing instruction. For example, for the | 
 | 210 |    processing instruction ``<?proc color='red'>``, this method would be called as | 
 | 211 |    ``handle_pi("proc color='red'")``.  It is intended to be overridden by a derived | 
 | 212 |    class; the base class implementation does nothing. | 
 | 213 |  | 
 | 214 |    .. note:: | 
 | 215 |  | 
 | 216 |       The :class:`HTMLParser` class uses the SGML syntactic rules for processing | 
 | 217 |       instructions.  An XHTML processing instruction using the trailing ``'?'`` will | 
 | 218 |       cause the ``'?'`` to be included in *data*. | 
 | 219 |  | 
 | 220 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 221 | .. method:: HTMLParser.unknown_decl(data) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 222 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 223 |    This method is called when an unrecognized declaration is read by the parser. | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 224 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 225 |    The *data* parameter will be the entire contents of the declaration inside | 
 | 226 |    the ``<![...]>`` markup.  It is sometimes useful to be overridden by a | 
 | 227 |    derived class.  The base class implementation raises an :exc:`HTMLParseError` | 
 | 228 |    when *strict* is ``True``. | 
 | 229 |  | 
 | 230 |  | 
 | 231 | .. _htmlparser-examples: | 
 | 232 |  | 
 | 233 | Examples | 
 | 234 | -------- | 
 | 235 |  | 
 | 236 | The following class implements a parser that will be used to illustrate more | 
 | 237 | examples:: | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 238 |  | 
| Ezio Melotti | f99e4b5 | 2011-10-28 14:34:56 +0300 | [diff] [blame] | 239 |    from html.parser import HTMLParser | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 240 |    from html.entities import name2codepoint | 
| Ezio Melotti | f99e4b5 | 2011-10-28 14:34:56 +0300 | [diff] [blame] | 241 |  | 
 | 242 |    class MyHTMLParser(HTMLParser): | 
 | 243 |        def handle_starttag(self, tag, attrs): | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 244 |            print("Start tag:", tag) | 
 | 245 |            for attr in attrs: | 
 | 246 |                print("     attr:", attr) | 
| Ezio Melotti | f99e4b5 | 2011-10-28 14:34:56 +0300 | [diff] [blame] | 247 |        def handle_endtag(self, tag): | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 248 |            print("End tag  :", tag) | 
| Ezio Melotti | f99e4b5 | 2011-10-28 14:34:56 +0300 | [diff] [blame] | 249 |        def handle_data(self, data): | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 250 |            print("Data     :", data) | 
 | 251 |        def handle_comment(self, data): | 
 | 252 |            print("Comment  :", data) | 
 | 253 |        def handle_entityref(self, name): | 
 | 254 |            c = chr(name2codepoint[name]) | 
 | 255 |            print("Named ent:", c) | 
 | 256 |        def handle_charref(self, name): | 
 | 257 |            if name.startswith('x'): | 
 | 258 |                c = chr(int(name[1:], 16)) | 
 | 259 |            else: | 
 | 260 |                c = chr(int(name)) | 
 | 261 |            print("Num ent  :", c) | 
 | 262 |        def handle_decl(self, data): | 
 | 263 |            print("Decl     :", data) | 
| Ezio Melotti | f99e4b5 | 2011-10-28 14:34:56 +0300 | [diff] [blame] | 264 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 265 |    parser = MyHTMLParser(strict=False) | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 266 |  | 
| Ezio Melotti | 4279bc7 | 2012-02-18 02:01:36 +0200 | [diff] [blame] | 267 | Parsing a doctype:: | 
 | 268 |  | 
 | 269 |    >>> parser.feed('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" ' | 
 | 270 |    ...             '"http://www.w3.org/TR/html4/strict.dtd">') | 
 | 271 |    Decl     : DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd" | 
 | 272 |  | 
 | 273 | Parsing an element with a few attributes and a title:: | 
 | 274 |  | 
 | 275 |    >>> parser.feed('<img src="python-logo.png" alt="The Python logo">') | 
 | 276 |    Start tag: img | 
 | 277 |         attr: ('src', 'python-logo.png') | 
 | 278 |         attr: ('alt', 'The Python logo') | 
 | 279 |    >>> | 
 | 280 |    >>> parser.feed('<h1>Python</h1>') | 
 | 281 |    Start tag: h1 | 
 | 282 |    Data     : Python | 
 | 283 |    End tag  : h1 | 
 | 284 |  | 
 | 285 | The content of ``script`` and ``style`` elements is returned as is, without | 
 | 286 | further parsing:: | 
 | 287 |  | 
 | 288 |    >>> parser.feed('<style type="text/css">#python { color: green }</style>') | 
 | 289 |    Start tag: style | 
 | 290 |         attr: ('type', 'text/css') | 
 | 291 |    Data     : #python { color: green } | 
 | 292 |    End tag  : style | 
 | 293 |    >>> | 
 | 294 |    >>> parser.feed('<script type="text/javascript">' | 
 | 295 |    ...             'alert("<strong>hello!</strong>");</script>') | 
 | 296 |    Start tag: script | 
 | 297 |         attr: ('type', 'text/javascript') | 
 | 298 |    Data     : alert("<strong>hello!</strong>"); | 
 | 299 |    End tag  : script | 
 | 300 |  | 
 | 301 | Parsing comments:: | 
 | 302 |  | 
 | 303 |    >>> parser.feed('<!-- a comment -->' | 
 | 304 |    ...             '<!--[if IE 9]>IE-specific content<![endif]-->') | 
 | 305 |    Comment  :  a comment | 
 | 306 |    Comment  : [if IE 9]>IE-specific content<![endif] | 
 | 307 |  | 
 | 308 | Parsing named and numeric character references and converting them to the | 
 | 309 | correct char (note: these 3 references are all equivalent to ``'>'``):: | 
 | 310 |  | 
 | 311 |    >>> parser.feed('>>>') | 
 | 312 |    Named ent: > | 
 | 313 |    Num ent  : > | 
 | 314 |    Num ent  : > | 
 | 315 |  | 
 | 316 | Feeding incomplete chunks to :meth:`~HTMLParser.feed` works, but | 
 | 317 | :meth:`~HTMLParser.handle_data` might be called more than once:: | 
 | 318 |  | 
 | 319 |    >>> for chunk in ['<sp', 'an>buff', 'ered ', 'text</s', 'pan>']: | 
 | 320 |    ...     parser.feed(chunk) | 
 | 321 |    ... | 
 | 322 |    Start tag: span | 
 | 323 |    Data     : buff | 
 | 324 |    Data     : ered | 
 | 325 |    Data     : text | 
 | 326 |    End tag  : span | 
 | 327 |  | 
 | 328 | Parsing invalid HTML (e.g. unquoted attributes) also works:: | 
 | 329 |  | 
 | 330 |    >>> parser.feed('<p><a class=link href=#main>tag soup</p ></a>') | 
 | 331 |    Start tag: p | 
 | 332 |    Start tag: a | 
 | 333 |         attr: ('class', 'link') | 
 | 334 |         attr: ('href', '#main') | 
 | 335 |    Data     : tag soup | 
 | 336 |    End tag  : p | 
 | 337 |    End tag  : a | 
| Georg Brandl | 116aa62 | 2007-08-15 14:28:22 +0000 | [diff] [blame] | 338 |  | 
| R. David Murray | b579dba | 2010-12-03 04:06:39 +0000 | [diff] [blame] | 339 | .. rubric:: Footnotes | 
 | 340 |  | 
| R. David Murray | bb7b753 | 2010-12-03 04:26:18 +0000 | [diff] [blame] | 341 | .. [#] For backward compatibility reasons *strict* mode does not raise | 
 | 342 |        exceptions for all non-compliant HTML.  That is, some invalid HTML | 
| R. David Murray | b579dba | 2010-12-03 04:06:39 +0000 | [diff] [blame] | 343 |        is tolerated even in *strict* mode. |