fixed a nasty bug #119387, bad heuristic from the progressive HTML parser

* HTMLparser.c: fixed a nasty bug #119387, bad heuristic from
  the progressive HTML parser front-end on large character data
  island leading to an erroneous end of data detection by the
  parser. Some cleanup too to get closer from the XML progressive
  parser.
Daniel
2 files changed