Narayan Kamath | 70dce01 | 2013-10-21 12:26:25 +0100 | [diff] [blame] | 1 | Changes from 1.2 to 1.2.1 |
| 2 | ========================= |
| 3 | Match DOCTYPE case-blind |
| 4 | Extend PushbackReader's size for oddball cases like & followed by CR |
| 5 | Leo Sutic's 2x-4x speedup by precompiling HTMLScanner table |
| 6 | |
| 7 | Changes from 1.1.3 to 1.2 |
| 8 | ========================= |
| 9 | Changed license to Apache 2.0 |
| 10 | Bogon default model is now ANY, not EMPTY |
| 11 | Support new DOCTYPE output switches --doctype-system and --doctype-public |
| 12 | Support new XML declaration output switches --standalone and --version |
| 13 | New --norootbogons switch makes bogons children of the root |
| 14 | Don't resolve entity references in attribute values unless semicolon-terminated |
| 15 | Support character entities above U+FFFF |
| 16 | Add character entities from the 2007-12-14 draft of xml-entity-names |
| 17 | Call SAX events startPrefixMapping and endPrefixMapping to report prefixes |
| 18 | Clean up newline processing, shrinking html.stml considerably |
| 19 | Allow link elements in the body as well as the head, to avoid excess bodies |
| 20 | Allow tables inside paragraphs |
| 21 | Allow cells and forms in thead and tfoot elements without intervening tr element |
| 22 | The span element is no longer restartable |
| 23 | Support non-standard elements bgsound, blink, canvas, comment, listing, |
| 24 | marquee, nobr, ruby, rbc, rtc, rb, rt, rp, wbr, xmp |
| 25 | In HTML mode, boolean attributes like checked are output in minimized form |
| 26 | Correctly handle runs of less-than characters |
| 27 | Suppress all but the first DOCTYPE declaration |
| 28 | Modify PI targets containing colons to have underscores instead |
| 29 | The case of element tags is now canonicalized to the schema |
| 30 | PI targets are no longer forced to lower case |
| 31 | |
| 32 | Changes from 1.1.2 to 1.1.3 |
| 33 | =========================== |
| 34 | Allow Parser.set* methods to accept null |
| 35 | Allow setting the LexicalHandler feature to be null |
| 36 | in both cases means "use default behavior" |
| 37 | |
| 38 | Changes from 1.1.1 to 1.1.2 |
| 39 | =========================== |
| 40 | Setting CDATAElementsFeature didn't really set CDATAElements instance variable |
| 41 | |
| 42 | Changes from 1.1 to 1.1.1 |
| 43 | ========================= |
| 44 | Removed lexical handler calls to startCDATA/endCDATA from CDATA element handling |
| 45 | Added lexical handler calls to startCDATA/endCDATA from CDATA section handling |
| 46 | Added CDATAElementsFeature, the programmatic equivalent of the --nocdata switch |
| 47 | |
| 48 | Changes from 1.0.5 to 1.1 |
| 49 | ========================= |
| 50 | Add Tatu Saloranta's JAXP support package |
| 51 | |
| 52 | Changes from 1.0.4 to 1.0.5 |
| 53 | =========================== |
| 54 | Major repairs to comment scanning |
| 55 | Skip leading BOM |
| 56 | Comment out debugging code in PYXWriter |
| 57 | Allow &#X as well as &#x |
| 58 | Add net.sf.saxon to list of supported XSLT engines |
| 59 | |
| 60 | Changes from 1.0.4 to 1.0.3 |
| 61 | =========================== |
| 62 | Certain options were mutually exclusive that should not have been |
| 63 | Blocked XML declaration from specifying an encoding of "" |
| 64 | --method=html was not doing the right thing |
| 65 | |
| 66 | Changes from 1.0.3 to 1.0.2 |
| 67 | =========================== |
| 68 | Fixed build file to use Java target version 1.4 |
| 69 | Fixed --version switch to print the right thing |
| 70 | |
| 71 | Changes from 1.0.1 to 1.0.2 |
| 72 | =========================== |
| 73 | Version attribute default value removed from html element |
| 74 | Leading and trailing hyphens now trimmed properly from comments |
| 75 | Added --output-encoding switch to control encoding |
| 76 | If output encoding is Unicode, don't generate character references |
| 77 | Whitespace compressed and junk stripped from public identifiers |
| 78 | |
| 79 | Changes from 1.0 to 1.0.1 |
| 80 | ========================= |
| 81 | Added ignorableWhitespaceFeature and --ignorable to report ignorable whitespace |
| 82 | Patch due to David Pashley |
| 83 | Insert spaces to break up -- in comments |
| 84 | Change bogus chars in publicids to spaces |
| 85 | --lexical switch now outputs DOCTYPE if there is one |
| 86 | Remove unnecessary blank line after XML declaration |
| 87 | |
| 88 | Changes from 1.0rc9 to 1.0 |
| 89 | ========================== |
| 90 | Added feature to control restartability |
| 91 | Patch due to Nikita Zhuk |
| 92 | Added corresponding --norestart switch in CommandLine |
| 93 | Made translate-colons feature actually work |
| 94 | |
| 95 | Changes from 1.0rc8 to 1.0rc9 |
| 96 | ============================= |
| 97 | If there is a publicid but no systemid, set systemid to "" |
| 98 | |
| 99 | Changes from 1.0rc7 to 1.0rc8 |
| 100 | ============================= |
| 101 | Fixed paper-bag bug (source didn't match binary in release) |
| 102 | |
| 103 | Changes from 1.0rc6 to 1.0rc7 |
| 104 | ============================= |
| 105 | LexicalHandler now gets DOCTYPE information (publicid and systemid) |
| 106 | Patch due to Mike Bremford |
| 107 | HTMLScanner now reports more useful debug output when not commented out |
| 108 | Patch due to Mike Bremford |
| 109 | Change "<memberOfAny>" to exclude "<root>" pseudo-element |
| 110 | This prevents "script" from being output as a root |
| 111 | The shared HTMLParser object has been eliminated |
| 112 | |
| 113 | Changes from 1.0rc5 to 1.0rc6 |
| 114 | ============================= |
| 115 | If namespaceFeature is false, uri and localname are passed as empty strings |
| 116 | The namespacePrefixesFeature is now always false |
| 117 | Command line switch --nons no longer affects namespacePrefixesFeature |
| 118 | Command line switch --html now implies --nons |
| 119 | XMLWriter is now told directly to use the schema's URI as default namespace |
| 120 | XMLWriter now takes the element name from the qname if localname is empty |
| 121 | |
| 122 | Changes from 1.0rc4 to 1.0rc5 |
| 123 | ============================= |
| 124 | The --nodefault switch now removes only default attributes, not all of them |
| 125 | Added --nocolons switch and translate-colons feature to convert ":" |
| 126 | in names to "_" (thus suppressing namespaces other than the basic one) |
| 127 | The root element can be unknown without problem |
| 128 | Empty <script/> and <style/> tags now work |
| 129 | Added all standard SAX2 features to feature hashtable |
| 130 | Reimplemented namespacePrefixes feature (broken since 1.0rc3) |
| 131 | |
| 132 | Changes from 1.0rc3 to 1.0rc4 |
| 133 | ============================= |
| 134 | Remove trailing ? from processing instructions (in case the input is XHTML) |
| 135 | Added Javadocs for all SAX standard and TagSoup-specific features and properties |
| 136 | Fixed termination conditions for entity/character references |
| 137 | Fixed EOF-pushback bug that was generating bogus 񥔵 references |
| 138 | Added Parser feature and --nodefaults switch to ignore default attribute values |
| 139 | Added support for SAX Locator |
| 140 | Updated AFL license to version 3.0 |
| 141 | Scanner buffer size increases as needed, allowing large attribute values |
| 142 | Look for various XSLT implementations as available (still fails in raw 5.0) |
| 143 | Clean up handling of XML empty tags and SGML minimized end-tags |
| 144 | Support proper options and help message internally |
| 145 | Use Hashtable in CommandLine class instead of HashMap |
| 146 | Do proper buffering of InputStream and Reader |
| 147 | Clean up content model of noframes element |
| 148 | Removed htmlMode in XMLWriter |
| 149 | Added support for XSLT output options METHOD=html and OMIT_XML_DECLARATION=yes |
| 150 | Command line option --html sets both of these |
| 151 | Wrote simple validator for TSSL schemas (tssl/tssl-validator.xslt) |
| 152 | Removed various validity problems in html.tssl |
| 153 | When processing a start-tag, don't restart elements that aren't in the new |
| 154 | element's content model |
| 155 | Remove bogus double param in tssl.xslt |
| 156 | |
| 157 | Changes from 1.0rc2 to 1.0rc3 |
| 158 | ============================= |
| 159 | Convert CR and CRLF to LF in comments and PIs |
| 160 | Force empty elements to close immediately |
| 161 | Match close tags of CDATA elements more precisely (but case-blind) |
| 162 | Process switches on the command line |
| 163 | Man page available |
| 164 | |
| 165 | Changes from 1.0rc1 to 1.0rc2 |
| 166 | ============================= |
| 167 | Isolated & and &# now don't crash parser |
| 168 | TagSoup no longer depends on /dev/stdin existing |
| 169 | Refactored Parser class, removing main method to new CommandLine class |
| 170 | Changes to content models of form, button, table, and tr elements in html.tssl |
| 171 | '</scr' + 'ipt>' in a script element no longer terminates it |
| 172 | Introduced "uncloseability" of form and table elements |
| 173 | "pyxin" property specifies that input is in PYX format |
| 174 | Correctly cope with unexpected characters around colons, also with multiple colons |
| 175 | Correctly output comments with "--" in them (by adding a space) |
| 176 | |
| 177 | Changes from 0.10.2 to 1.0rc1 |
| 178 | ============================= |
| 179 | Script can now appear anywhere |
| 180 | Switch -nocdata correctly implemented |
| 181 | Eliminated useless M_n constants in Schema |
| 182 | Introduced <memberofAny> and <isRoot> as alternatives to |
| 183 | <memberOf> in TSSL |
| 184 | Allow prefixes in element names |
| 185 | Attributes are now normalized |
| 186 | Expanded public API for Element and ElementType |
| 187 | Javadoc improved |
| 188 | |
| 189 | Changes from 0.10.1 to 0.10.2 |
| 190 | ============================= |
| 191 | Removed misfeature whereby > terminated a tag even inside quotes |
| 192 | Added licensing language to XSLT scripts, RELAX NG schemas |
| 193 | Removed long-standing mishandling of entity references in attributes |
| 194 | Cleaned up logic for converting junky strings to proper XML Names |
| 195 | Correctly handle empty tag that has no whitespace or attributes |
| 196 | Restore correct 0.9.3 handling of an apparent end-tag in a CDATA element |
| 197 | Added script element to content model of head element |
| 198 | |
| 199 | Changes from 0.9.7 to 0.10.1 (there is no 0.10.0): |
| 200 | ================================================== |
| 201 | Convert to XSLT configuration exclusively; |
| 202 | Perl code and tab-separated tables are gone |
| 203 | Remove xmlns:* attributes |
| 204 | Append "_" to attribute names ending in ":" |
| 205 | Don't prepend "_" to an attribute name starting in "_" |
| 206 | Handle namespace prefixes in attributes: |
| 207 | "xml" prefix is handled correctly |
| 208 | other prefixes are mapped to "urn:x-prefix:foo" |
| 209 | Ignore XML declarations |
| 210 | -Dnocdata=true turns off F_CDATA on script and style elements |
| 211 | Fixed off-by-one errors in character references that made them uninterpreted |
| 212 | Start-tags ending in a minimized attribute are no longer being dropped |
| 213 | XML empty tags are now supported (though slashes are still allowed in |
| 214 | unquoted attribute values) |
| 215 | |
| 216 | Changes from 0.9.6 to 0.9.7: |
| 217 | ============================ |
| 218 | Upgraded AFL to version 2.1 |
| 219 | Passed through newlines in character content (very old bug) |
| 220 | |
| 221 | Changes from 0.9.5 to 0.9.6: |
| 222 | ============================ |
| 223 | Script element can appear directly in body |
| 224 | ">" terminates a start-tag even inside a quoted attribute, |
| 225 | to protect against unbalanced quotes |
| 226 | "_" is prepended to attributes that don't begin with a letter |
| 227 | Remove "xmlns" attributes from the input |
| 228 | All standard features can now be set |
| 229 | (although there is no effect from doing so) |
| 230 | New "bogons-empty" feature can be set to false to give bogons |
| 231 | content model of ANY rather than EMPTY; |
| 232 | -Dany switch sets this feature to false |
| 233 | TSSL now has an explicit group element to declare an element group |
| 234 | STML is a new XML format for modeling state-table changes |
| 235 | License updated to AFL 2.1 |
| 236 | |
| 237 | Changes from 0.9.4 to 0.9.5: |
| 238 | ============================ |
| 239 | S in the statetable now means \r and \n and \t as well as space |
| 240 | (as was always intended; brain fart!) |
| 241 | Ins and del elements are now allowed everywhere |
| 242 | TSSL now correctly supports attributes that are legal on all elements |
| 243 | |
| 244 | Changes from 0.9.3 to 0.9.4: |
| 245 | ============================ |
| 246 | Fixed paper-bag bug that revealed attribute type BOOLEAN to applications. |
| 247 | Obsolete ABSTRACT removed in favor of README. |
| 248 | Improved implementation of CDATA restart after bogus end-tag. |
| 249 | Allowed hyphen, underscore, and period in names as well as colon. |
| 250 | First cut at TagSoup Schema Language -- doesn't do anything yet. |
| 251 | Support CDATA sections on input. |
| 252 | Don't generate built-in entities within CDATA elements. |
| 253 | |
| 254 | Changes from 0.9.2 to 0.9.3: |
| 255 | ============================ |
| 256 | Convenience main program "tagsoup" in bin directory. |
| 257 | Begin to integrate tests. |
| 258 | Introduced BOOLEAN type (currently just converted to NMTOKEN). |
| 259 | Features that actually work are now named constants in Parser. |
| 260 | Double root elements are really gone now. |
| 261 | ID attributes weren't being removed from restarted elements. |
| 262 | Fixed a bug that made unknown elements disappear in some cases. |
| 263 | Parser is now safely reusable. |
| 264 | PYXWriter and XMLWriter now implement LexicalHandler. |
| 265 | Parser reports comments, startCDATA, and endCDATA events to a LexicalHandler. |
| 266 | ScanHandler methods now throw only SAXException, not also IOException. |
| 267 | -Dlexical=true switch sets the ContentHandler as a LexicalHandler as well |
| 268 | (XMLWriter prints comments, ignores CDATA sections; PYXWriter ignores all). |
| 269 | -Dreuse=true switch reuses a single Parser object (no great speed gain). |
| 270 | We now disallow an a element as the child of another a element. |
| 271 | An empty input is now treated as zero-length character content. |
| 272 | HTMLWriter is gone in favor of an extended XMLWriter with get/setHTMLMode methods. |
| 273 | CDATA elements only terminaate with matching end-tags (thanks to Sebastien Bardoux). |
| 274 | |
| 275 | Changes from 0.9.1 to 0.9.2: |
| 276 | ============================ |
| 277 | No longer inserts bogus ; after unknown entity reference without ;. |
| 278 | Consecutive entity references now work correctly. |
| 279 | Setting namespaces and namespace-prefixes methods now works. |
| 280 | -Dnons=true option turns off namespace and prefix. |
| 281 | New feature http://www.ccil.org/~cowan/tagsoup/features/ignore-bogons" |
| 282 | suppresses unknown start-tags (any end-tag will be automatically ignored). |
| 283 | -Dnobogons=true option turns ignore-bogons on. |
| 284 | Suppress unknown and/or empty initial start-tag always |
| 285 | (prevents double root element). |
| 286 | Schema now allows style as an inline element, like script. |
| 287 | Schema now allows tr as a child of table to avoid problems with embedded tables. |
| 288 | Clear Parser instance variables to make Parsers properly reusable. |
| 289 | |
| 290 | Changes from 0.9 to 0.9.1: |
| 291 | ========================== |
| 292 | Incorporated patch for -jar support by Joseph Walton. |
| 293 | Incorporated patch for Megginson XMLWriter support by Joseph Walton. |
| 294 | Changed existing XMLWriter to HTMLWriter. |
| 295 | Rewrote Parsermain for better features, removed Tester class. |
| 296 | -Dnewline=true removed, now implied by -DHTML=true. |
| 297 | -Dfiles=true now used to generate separate outputs (old Tester behavior) |
| 298 | with extension xhtml (removing any old extension). |
| 299 | Fixed nasty bug in HTMLScanner that was failing to fix unusual entities. |
| 300 | Don't attempt to smash whitespace to spaces any more. |
| 301 | |
| 302 | Changes from 0.8 to 0.9: |
| 303 | ======================== |
| 304 | Ant-ified by Martin Rademacher. |
| 305 | Don't suppress colons in element names. |
| 306 | Entity problems fixed (I hope). |
| 307 | Can now set namespace and namespace-prefixes features (without effect). |
| 308 | Properly templatize HTMLModels.java. |
| 309 | Attributes are no longer in the HTML namespace. |