blob: 73e5887ac92644961e1ba6fd7fb163adf13bb4dd [file] [log] [blame]
Narayan Kamath70dce012013-10-21 12:26:25 +01001Changes from 1.2 to 1.2.1
2=========================
3Match DOCTYPE case-blind
4Extend PushbackReader's size for oddball cases like & followed by CR
5Leo Sutic's 2x-4x speedup by precompiling HTMLScanner table
6
7Changes from 1.1.3 to 1.2
8=========================
9Changed license to Apache 2.0
10Bogon default model is now ANY, not EMPTY
11Support new DOCTYPE output switches --doctype-system and --doctype-public
12Support new XML declaration output switches --standalone and --version
13New --norootbogons switch makes bogons children of the root
14Don't resolve entity references in attribute values unless semicolon-terminated
15Support character entities above U+FFFF
16Add character entities from the 2007-12-14 draft of xml-entity-names
17Call SAX events startPrefixMapping and endPrefixMapping to report prefixes
18Clean up newline processing, shrinking html.stml considerably
19Allow link elements in the body as well as the head, to avoid excess bodies
20Allow tables inside paragraphs
21Allow cells and forms in thead and tfoot elements without intervening tr element
22The span element is no longer restartable
23Support non-standard elements bgsound, blink, canvas, comment, listing,
24 marquee, nobr, ruby, rbc, rtc, rb, rt, rp, wbr, xmp
25In HTML mode, boolean attributes like checked are output in minimized form
26Correctly handle runs of less-than characters
27Suppress all but the first DOCTYPE declaration
28Modify PI targets containing colons to have underscores instead
29The case of element tags is now canonicalized to the schema
30PI targets are no longer forced to lower case
31
32Changes from 1.1.2 to 1.1.3
33===========================
34Allow Parser.set* methods to accept null
35Allow setting the LexicalHandler feature to be null
36 in both cases means "use default behavior"
37
38Changes from 1.1.1 to 1.1.2
39===========================
40Setting CDATAElementsFeature didn't really set CDATAElements instance variable
41
42Changes from 1.1 to 1.1.1
43=========================
44Removed lexical handler calls to startCDATA/endCDATA from CDATA element handling
45Added lexical handler calls to startCDATA/endCDATA from CDATA section handling
46Added CDATAElementsFeature, the programmatic equivalent of the --nocdata switch
47
48Changes from 1.0.5 to 1.1
49=========================
50Add Tatu Saloranta's JAXP support package
51
52Changes from 1.0.4 to 1.0.5
53===========================
54Major repairs to comment scanning
55Skip leading BOM
56Comment out debugging code in PYXWriter
57Allow &#X as well as &#x
58Add net.sf.saxon to list of supported XSLT engines
59
60Changes from 1.0.4 to 1.0.3
61===========================
62Certain options were mutually exclusive that should not have been
63Blocked XML declaration from specifying an encoding of ""
64--method=html was not doing the right thing
65
66Changes from 1.0.3 to 1.0.2
67===========================
68Fixed build file to use Java target version 1.4
69Fixed --version switch to print the right thing
70
71Changes from 1.0.1 to 1.0.2
72===========================
73Version attribute default value removed from html element
74Leading and trailing hyphens now trimmed properly from comments
75Added --output-encoding switch to control encoding
76If output encoding is Unicode, don't generate character references
77Whitespace compressed and junk stripped from public identifiers
78
79Changes from 1.0 to 1.0.1
80=========================
81Added ignorableWhitespaceFeature and --ignorable to report ignorable whitespace
82 Patch due to David Pashley
83Insert spaces to break up -- in comments
84Change bogus chars in publicids to spaces
85--lexical switch now outputs DOCTYPE if there is one
86Remove unnecessary blank line after XML declaration
87
88Changes from 1.0rc9 to 1.0
89==========================
90Added feature to control restartability
91 Patch due to Nikita Zhuk
92Added corresponding --norestart switch in CommandLine
93Made translate-colons feature actually work
94
95Changes from 1.0rc8 to 1.0rc9
96=============================
97If there is a publicid but no systemid, set systemid to ""
98
99Changes from 1.0rc7 to 1.0rc8
100=============================
101Fixed paper-bag bug (source didn't match binary in release)
102
103Changes from 1.0rc6 to 1.0rc7
104=============================
105LexicalHandler now gets DOCTYPE information (publicid and systemid)
106 Patch due to Mike Bremford
107HTMLScanner now reports more useful debug output when not commented out
108 Patch due to Mike Bremford
109Change "<memberOfAny>" to exclude "<root>" pseudo-element
110 This prevents "script" from being output as a root
111The shared HTMLParser object has been eliminated
112
113Changes from 1.0rc5 to 1.0rc6
114=============================
115If namespaceFeature is false, uri and localname are passed as empty strings
116The namespacePrefixesFeature is now always false
117Command line switch --nons no longer affects namespacePrefixesFeature
118Command line switch --html now implies --nons
119XMLWriter is now told directly to use the schema's URI as default namespace
120XMLWriter now takes the element name from the qname if localname is empty
121
122Changes from 1.0rc4 to 1.0rc5
123=============================
124The --nodefault switch now removes only default attributes, not all of them
125Added --nocolons switch and translate-colons feature to convert ":"
126 in names to "_" (thus suppressing namespaces other than the basic one)
127The root element can be unknown without problem
128Empty <script/> and <style/> tags now work
129Added all standard SAX2 features to feature hashtable
130Reimplemented namespacePrefixes feature (broken since 1.0rc3)
131
132Changes from 1.0rc3 to 1.0rc4
133=============================
134Remove trailing ? from processing instructions (in case the input is XHTML)
135Added Javadocs for all SAX standard and TagSoup-specific features and properties
136Fixed termination conditions for entity/character references
137Fixed EOF-pushback bug that was generating bogus &#x65535; references
138Added Parser feature and --nodefaults switch to ignore default attribute values
139Added support for SAX Locator
140Updated AFL license to version 3.0
141Scanner buffer size increases as needed, allowing large attribute values
142Look for various XSLT implementations as available (still fails in raw 5.0)
143Clean up handling of XML empty tags and SGML minimized end-tags
144Support proper options and help message internally
145Use Hashtable in CommandLine class instead of HashMap
146Do proper buffering of InputStream and Reader
147Clean up content model of noframes element
148Removed htmlMode in XMLWriter
149Added support for XSLT output options METHOD=html and OMIT_XML_DECLARATION=yes
150Command line option --html sets both of these
151Wrote simple validator for TSSL schemas (tssl/tssl-validator.xslt)
152Removed various validity problems in html.tssl
153When processing a start-tag, don't restart elements that aren't in the new
154 element's content model
155Remove bogus double param in tssl.xslt
156
157Changes from 1.0rc2 to 1.0rc3
158=============================
159Convert CR and CRLF to LF in comments and PIs
160Force empty elements to close immediately
161Match close tags of CDATA elements more precisely (but case-blind)
162Process switches on the command line
163Man page available
164
165Changes from 1.0rc1 to 1.0rc2
166=============================
167Isolated & and &# now don't crash parser
168TagSoup no longer depends on /dev/stdin existing
169Refactored Parser class, removing main method to new CommandLine class
170Changes to content models of form, button, table, and tr elements in html.tssl
171'</scr' + 'ipt>' in a script element no longer terminates it
172Introduced "uncloseability" of form and table elements
173"pyxin" property specifies that input is in PYX format
174Correctly cope with unexpected characters around colons, also with multiple colons
175Correctly output comments with "--" in them (by adding a space)
176
177Changes from 0.10.2 to 1.0rc1
178=============================
179Script can now appear anywhere
180Switch -nocdata correctly implemented
181Eliminated useless M_n constants in Schema
182Introduced <memberofAny> and <isRoot> as alternatives to
183 <memberOf> in TSSL
184Allow prefixes in element names
185Attributes are now normalized
186Expanded public API for Element and ElementType
187Javadoc improved
188
189Changes from 0.10.1 to 0.10.2
190=============================
191Removed misfeature whereby > terminated a tag even inside quotes
192Added licensing language to XSLT scripts, RELAX NG schemas
193Removed long-standing mishandling of entity references in attributes
194Cleaned up logic for converting junky strings to proper XML Names
195Correctly handle empty tag that has no whitespace or attributes
196Restore correct 0.9.3 handling of an apparent end-tag in a CDATA element
197Added script element to content model of head element
198
199Changes from 0.9.7 to 0.10.1 (there is no 0.10.0):
200==================================================
201Convert to XSLT configuration exclusively;
202 Perl code and tab-separated tables are gone
203Remove xmlns:* attributes
204Append "_" to attribute names ending in ":"
205Don't prepend "_" to an attribute name starting in "_"
206Handle namespace prefixes in attributes:
207 "xml" prefix is handled correctly
208 other prefixes are mapped to "urn:x-prefix:foo"
209Ignore XML declarations
210-Dnocdata=true turns off F_CDATA on script and style elements
211Fixed off-by-one errors in character references that made them uninterpreted
212Start-tags ending in a minimized attribute are no longer being dropped
213XML empty tags are now supported (though slashes are still allowed in
214 unquoted attribute values)
215
216Changes from 0.9.6 to 0.9.7:
217============================
218Upgraded AFL to version 2.1
219Passed through newlines in character content (very old bug)
220
221Changes from 0.9.5 to 0.9.6:
222============================
223Script element can appear directly in body
224">" terminates a start-tag even inside a quoted attribute,
225 to protect against unbalanced quotes
226"_" is prepended to attributes that don't begin with a letter
227Remove "xmlns" attributes from the input
228All standard features can now be set
229 (although there is no effect from doing so)
230New "bogons-empty" feature can be set to false to give bogons
231 content model of ANY rather than EMPTY;
232 -Dany switch sets this feature to false
233TSSL now has an explicit group element to declare an element group
234STML is a new XML format for modeling state-table changes
235License updated to AFL 2.1
236
237Changes from 0.9.4 to 0.9.5:
238============================
239S in the statetable now means \r and \n and \t as well as space
240 (as was always intended; brain fart!)
241Ins and del elements are now allowed everywhere
242TSSL now correctly supports attributes that are legal on all elements
243
244Changes from 0.9.3 to 0.9.4:
245============================
246Fixed paper-bag bug that revealed attribute type BOOLEAN to applications.
247Obsolete ABSTRACT removed in favor of README.
248Improved implementation of CDATA restart after bogus end-tag.
249Allowed hyphen, underscore, and period in names as well as colon.
250First cut at TagSoup Schema Language -- doesn't do anything yet.
251Support CDATA sections on input.
252Don't generate built-in entities within CDATA elements.
253
254Changes from 0.9.2 to 0.9.3:
255============================
256Convenience main program "tagsoup" in bin directory.
257Begin to integrate tests.
258Introduced BOOLEAN type (currently just converted to NMTOKEN).
259Features that actually work are now named constants in Parser.
260Double root elements are really gone now.
261ID attributes weren't being removed from restarted elements.
262Fixed a bug that made unknown elements disappear in some cases.
263Parser is now safely reusable.
264PYXWriter and XMLWriter now implement LexicalHandler.
265Parser reports comments, startCDATA, and endCDATA events to a LexicalHandler.
266ScanHandler methods now throw only SAXException, not also IOException.
267-Dlexical=true switch sets the ContentHandler as a LexicalHandler as well
268 (XMLWriter prints comments, ignores CDATA sections; PYXWriter ignores all).
269-Dreuse=true switch reuses a single Parser object (no great speed gain).
270We now disallow an a element as the child of another a element.
271An empty input is now treated as zero-length character content.
272HTMLWriter is gone in favor of an extended XMLWriter with get/setHTMLMode methods.
273CDATA elements only terminaate with matching end-tags (thanks to Sebastien Bardoux).
274
275Changes from 0.9.1 to 0.9.2:
276============================
277No longer inserts bogus ; after unknown entity reference without ;.
278Consecutive entity references now work correctly.
279Setting namespaces and namespace-prefixes methods now works.
280-Dnons=true option turns off namespace and prefix.
281New feature http://www.ccil.org/~cowan/tagsoup/features/ignore-bogons"
282 suppresses unknown start-tags (any end-tag will be automatically ignored).
283-Dnobogons=true option turns ignore-bogons on.
284Suppress unknown and/or empty initial start-tag always
285 (prevents double root element).
286Schema now allows style as an inline element, like script.
287Schema now allows tr as a child of table to avoid problems with embedded tables.
288Clear Parser instance variables to make Parsers properly reusable.
289
290Changes from 0.9 to 0.9.1:
291==========================
292Incorporated patch for -jar support by Joseph Walton.
293Incorporated patch for Megginson XMLWriter support by Joseph Walton.
294Changed existing XMLWriter to HTMLWriter.
295Rewrote Parsermain for better features, removed Tester class.
296-Dnewline=true removed, now implied by -DHTML=true.
297-Dfiles=true now used to generate separate outputs (old Tester behavior)
298 with extension xhtml (removing any old extension).
299Fixed nasty bug in HTMLScanner that was failing to fix unusual entities.
300Don't attempt to smash whitespace to spaces any more.
301
302Changes from 0.8 to 0.9:
303========================
304Ant-ified by Martin Rademacher.
305Don't suppress colons in element names.
306Entity problems fixed (I hope).
307Can now set namespace and namespace-prefixes features (without effect).
308Properly templatize HTMLModels.java.
309Attributes are no longer in the HTML namespace.