Upstream | bc0ee9a | 1970-01-12 13:46:40 +0000 | [diff] [blame^] | 1 | /** @mainpage |
| 2 | |
| 3 | <h1> TinyXml </h1> |
| 4 | |
| 5 | TinyXml is a simple, small, C++ XML parser that can be easily |
| 6 | integrating into other programs. |
| 7 | |
| 8 | <h2> What it does. </h2> |
| 9 | |
| 10 | In brief, TinyXml parses an XML document, and builds from that a |
| 11 | Document Object Model (DOM) that can be read, modified, and saved. |
| 12 | |
| 13 | XML stands for "eXtensible Markup Language." It allows you to create |
| 14 | your own document markups. Where HTML does a very good job of marking |
| 15 | documents for browsers, XML allows you to define any kind of document |
| 16 | markup, for example a document that describes a "to do" list for an |
| 17 | organizer application. XML is a very structured and convenient format. |
| 18 | All those random file formats created to store application data can |
| 19 | all be replaced with XML. One parser for everything. |
| 20 | |
| 21 | The best place for the complete, correct, and quite frankly hard to |
| 22 | read spec is at <a href="http://www.w3.org/TR/2004/REC-xml-20040204/"> |
| 23 | http://www.w3.org/TR/2004/REC-xml-20040204/</a>. An intro to XML |
| 24 | (that I really like) can be found at |
| 25 | <a href="http://skew.org/xml/tutorial/">http://skew.org/xml/tutorial</a>. |
| 26 | |
| 27 | There are different ways to access and interact with XML data. |
| 28 | TinyXml uses a Document Object Model (DOM), meaning the XML data is parsed |
| 29 | into a C++ objects that can be browsed and manipulated, and then |
| 30 | written to disk or another output stream. You can also construct an XML document from |
| 31 | scratch with C++ objects and write this to disk or another output |
| 32 | stream. |
| 33 | |
| 34 | TinyXml is designed to be easy and fast to learn. It is two headers |
| 35 | and four cpp files. Simply add these to your project and off you go. |
| 36 | There is an example file - xmltest.cpp - to get you started. |
| 37 | |
| 38 | TinyXml is released under the ZLib license, |
| 39 | so you can use it in open source or commercial code. The details |
| 40 | of the license are at the top of every source file. |
| 41 | |
| 42 | TinyXml attempts to be a flexible parser, but with truly correct and |
| 43 | compliant XML output. TinyXml should compile on any reasonably C++ |
| 44 | compliant system. It does not rely on exceptions or RTTI. It can be |
| 45 | compiled with or without STL support. TinyXml fully supports |
| 46 | the UTF-8 encoding, and the first 64k character entities. |
| 47 | |
| 48 | |
| 49 | <h2> What it doesn't do. </h2> |
| 50 | |
| 51 | It doesnt parse or use DTDs (Document Type Definitions) or XSLs |
| 52 | (eXtensible Stylesheet Language.) There are other parsers out there |
| 53 | (check out www.sourceforge.org, search for XML) that are much more fully |
| 54 | featured. But they are also much bigger, take longer to set up in |
| 55 | your project, have a higher learning curve, and often have a more |
| 56 | restrictive license. If you are working with browsers or have more |
| 57 | complete XML needs, TinyXml is not the parser for you. |
| 58 | |
| 59 | The following DTD syntax will not parse at this time in TinyXml: |
| 60 | |
| 61 | @verbatim |
| 62 | <!DOCTYPE Archiv [ |
| 63 | <!ELEMENT Comment (#PCDATA)> |
| 64 | ]> |
| 65 | @endverbatim |
| 66 | |
| 67 | because TinyXml sees this as a !DOCTYPE node with an illegally |
| 68 | embedded !ELEMENT node. This may be addressed in the future. |
| 69 | |
| 70 | <h2> Tutorials. </h2> |
| 71 | |
| 72 | For the impatient, here is a tutorial to get you going. A great way to get started, |
| 73 | but it is worth your time to read this (very short) manual completely. |
| 74 | |
| 75 | - @subpage tutorial0 |
| 76 | |
| 77 | <h2> Code Status. </h2> |
| 78 | |
| 79 | TinyXml is mature, tested code. It is very stable. If you find |
| 80 | bugs, please file a bug report on the sourceforge web site |
| 81 | (www.sourceforge.net/projects/tinyxml). |
| 82 | We'll get them straightened out as soon as possible. |
| 83 | |
| 84 | There are some areas of improvement; please check sourceforge if you are |
| 85 | interested in working on TinyXml. |
| 86 | |
| 87 | |
| 88 | <h2> Features </h2> |
| 89 | |
| 90 | <h3> Using STL </h3> |
| 91 | |
| 92 | TinyXml can be compiled to use or not use STL. When using STL, TinyXml |
| 93 | uses the std::string class, and fully supports std::istream, std::ostream, |
| 94 | operator<<, and operator>>. Many API methods have both 'const char*' and |
| 95 | 'const std::string&' forms. |
| 96 | |
| 97 | When STL support is compiled out, no STL files are included whatsover. All |
| 98 | the string classes are implemented by TinyXml itself. API methods |
| 99 | all use the 'const char*' form for input. |
| 100 | |
| 101 | Use the compile time #define: |
| 102 | |
| 103 | TIXML_USE_STL |
| 104 | |
| 105 | to compile one version or the other. This can be passed by the compiler, |
| 106 | or set as the first line of "tinyxml.h". |
| 107 | |
| 108 | Note: If compiling the test code in Linux, setting the environment |
| 109 | variable TINYXML_USE_STL=YES/NO will control STL compilation. In the |
| 110 | Windows project file, STL and non STL targets are provided. In your project, |
| 111 | its probably easiest to add the line "#define TIXML_USE_STL" as the first |
| 112 | line of tinyxml.h. |
| 113 | |
| 114 | <h3> UTF-8 </h3> |
| 115 | |
| 116 | TinyXml supports UTF-8 allowing to manipulate XML files in any language. TinyXml |
| 117 | also supports "legacy mode" - the encoding used before UTF-8 support and |
| 118 | probably best described as "extended ascii". |
| 119 | |
| 120 | Normally, TinyXml will try to detect the correct encoding and use it. However, |
| 121 | by setting the value of TIXML_DEFAULT_ENCODING in the header file, TinyXml |
| 122 | can be forced to always use one encoding. |
| 123 | |
| 124 | TinyXml will assume Legacy Mode until one of the following occurs: |
| 125 | <ol> |
| 126 | <li> If the non-standard but common "UTF-8 lead bytes" (0xef 0xbb 0xbf) |
| 127 | begin the file or data stream, TinyXml will read it as UTF-8. </li> |
| 128 | <li> If the declaration tag is read, and it has an encoding="UTF-8", then |
| 129 | TinyXml will read it as UTF-8. </li> |
| 130 | <li> If the declaration tag is read, and it has no encoding specified, then |
| 131 | TinyXml will read it as UTF-8. </li> |
| 132 | <li> If the declaration tag is read, and it has an encoding="something else", then |
| 133 | TinyXml will read it as Legacy Mode. In legacy mode, TinyXml will |
| 134 | work as it did before. It's not clear what that mode does exactly, but |
| 135 | old content should keep working.</li> |
| 136 | <li> Until one of the above criteria is met, TinyXml runs in Legacy Mode.</li> |
| 137 | </ol> |
| 138 | |
| 139 | What happens if the encoding is incorrectly set or detected? TinyXml will try |
| 140 | to read and pass through text seen as improperly encoded. You may get some strange |
| 141 | results or mangled characters. You may want to force TinyXml to the correct mode. |
| 142 | |
| 143 | <b> You may force TinyXml to Legacy Mode by using LoadFile( TIXML_ENCODING_LEGACY ) or |
| 144 | LoadFile( filename, TIXML_ENCODING_LEGACY ). You may force it to use legacy mode all |
| 145 | the time by setting TIXML_DEFAULT_ENCODING = TIXML_ENCODING_LEGACY. Likewise, you may |
| 146 | force it to TIXML_ENCODING_UTF8 with the same technique.</b> |
| 147 | |
| 148 | For English users, using English XML, UTF-8 is the same as low-ASCII. You |
| 149 | don't need to be aware of UTF-8 or change your code in any way. You can think |
| 150 | of UTF-8 as a "superset" of ASCII. |
| 151 | |
| 152 | UTF-8 is not a double byte format - but it is a standard encoding of Unicode! |
| 153 | TinyXml does not use or directly support wchar, TCHAR, or Microsofts _UNICODE at this time. |
| 154 | It is common to see the term "Unicode" improperly refer to UTF-16, a wide byte encoding |
| 155 | of unicode. This is a source of confusion. |
| 156 | |
| 157 | For "high-ascii" languages - everything not English, pretty much - TinyXml can |
| 158 | handle all languages, at the same time, as long as the XML is encoded |
| 159 | in UTF-8. That can be a little tricky, older programs and operating systems |
| 160 | tend to use the "default" or "traditional" code page. Many apps (and almost all |
| 161 | modern ones) can output UTF-8, but older or stubborn (or just broken) ones |
| 162 | still output text in the default code page. |
| 163 | |
| 164 | For example, Japanese systems traditionally use SHIFT-JIS encoding. |
| 165 | Text encoded as SHIFT-JIS can not be read by tinyxml. |
| 166 | A good text editor can import SHIFT-JIS and then save as UTF-8. |
| 167 | |
| 168 | The <a href="http://skew.org/xml/tutorial/">Skew.org link</a> does a great |
| 169 | job covering the encoding issue. |
| 170 | |
| 171 | The test file "utf8test.xml" is an XML containing English, Spanish, Russian, |
| 172 | and Simplified Chinese. (Hopefully they are translated correctly). The file |
| 173 | "utf8test.gif" is a screen capture of the XML file, rendered in IE. Note that |
| 174 | if you don't have the correct fonts (Simplified Chinese or Russian) on your |
| 175 | system, you won't see output that matches the GIF file even if you can parse |
| 176 | it correctly. Also note that (at least on my Windows machine) console output |
| 177 | is in a Western code page, so that Print() or printf() cannot correctly display |
| 178 | the file. This is not a bug in TinyXml - just an OS issue. No data is lost or |
| 179 | destroyed by TinyXml. The console just doesn't render UTF-8. |
| 180 | |
| 181 | |
| 182 | <h3> Entities </h3> |
| 183 | TinyXml recognizes the pre-defined "character entities", meaning special |
| 184 | characters. Namely: |
| 185 | |
| 186 | @verbatim |
| 187 | & & |
| 188 | < < |
| 189 | > > |
| 190 | " " |
| 191 | ' ' |
| 192 | @endverbatim |
| 193 | |
| 194 | These are recognized when the XML document is read, and translated to there |
| 195 | UTF-8 equivalents. For instance, text with the XML of: |
| 196 | |
| 197 | @verbatim |
| 198 | Far & Away |
| 199 | @endverbatim |
| 200 | |
| 201 | will have the Value() of "Far & Away" when queried from the TiXmlText object, |
| 202 | and will be written back to the XML stream/file as an ampersand. Older versions |
| 203 | of TinyXml "preserved" character entities, but the newer versions will translate |
| 204 | them into characters. |
| 205 | |
| 206 | Additionally, any character can be specified by its Unicode code point: |
| 207 | The syntax " " or " " are both to the non-breaking space characher. |
| 208 | |
| 209 | |
| 210 | <h3> Streams </h3> |
| 211 | With TIXML_USE_STL on, |
| 212 | TiXml has been modified to support both C (FILE) and C++ (operator <<,>>) |
| 213 | streams. There are some differences that you may need to be aware of. |
| 214 | |
| 215 | C style output: |
| 216 | - based on FILE* |
| 217 | - the Print() and SaveFile() methods |
| 218 | |
| 219 | Generates formatted output, with plenty of white space, intended to be as |
| 220 | human-readable as possible. They are very fast, and tolerant of ill formed |
| 221 | XML documents. For example, an XML document that contains 2 root elements |
| 222 | and 2 declarations, will still print. |
| 223 | |
| 224 | C style input: |
| 225 | - based on FILE* |
| 226 | - the Parse() and LoadFile() methods |
| 227 | |
| 228 | A fast, tolerant read. Use whenever you don't need the C++ streams. |
| 229 | |
| 230 | C++ style ouput: |
| 231 | - based on std::ostream |
| 232 | - operator<< |
| 233 | |
| 234 | Generates condensed output, intended for network transmission rather than |
| 235 | readability. Depending on your system's implementation of the ostream class, |
| 236 | these may be somewhat slower. (Or may not.) Not tolerant of ill formed XML: |
| 237 | a document should contain the correct one root element. Additional root level |
| 238 | elements will not be streamed out. |
| 239 | |
| 240 | C++ style input: |
| 241 | - based on std::istream |
| 242 | - operator>> |
| 243 | |
| 244 | Reads XML from a stream, making it useful for network transmission. The tricky |
| 245 | part is knowing when the XML document is complete, since there will almost |
| 246 | certainly be other data in the stream. TinyXml will assume the XML data is |
| 247 | complete after it reads the root element. Put another way, documents that |
| 248 | are ill-constructed with more than one root element will not read correctly. |
| 249 | Also note that operator>> is somewhat slower than Parse, due to both |
| 250 | implementation of the STL and limitations of TinyXml. |
| 251 | |
| 252 | <h3> White space </h3> |
| 253 | The world simply does not agree on whether white space should be kept, or condensed. |
| 254 | For example, pretend the '_' is a space, and look at "Hello____world". HTML, and |
| 255 | at least some XML parsers, will interpret this as "Hello_world". They condense white |
| 256 | space. Some XML parsers do not, and will leave it as "Hello____world". (Remember |
| 257 | to keep pretending the _ is a space.) Others suggest that __Hello___world__ should become |
| 258 | Hello___world. |
| 259 | |
| 260 | It's an issue that hasn't been resolved to my satisfaction. TinyXml supports the |
| 261 | first 2 approaches. Call TiXmlBase::SetCondenseWhiteSpace( bool ) to set the desired behavior. |
| 262 | The default is to condense white space. |
| 263 | |
| 264 | If you change the default, you should call TiXmlBase::SetCondenseWhiteSpace( bool ) |
| 265 | before making any calls to Parse XML data, and I don't recommend changing it after |
| 266 | it has been set. |
| 267 | |
| 268 | |
| 269 | <h3> Handles </h3> |
| 270 | |
| 271 | Where browsing an XML document in a robust way, it is important to check |
| 272 | for null returns from method calls. An error safe implementation can |
| 273 | generate a lot of code like: |
| 274 | |
| 275 | @verbatim |
| 276 | TiXmlElement* root = document.FirstChildElement( "Document" ); |
| 277 | if ( root ) |
| 278 | { |
| 279 | TiXmlElement* element = root->FirstChildElement( "Element" ); |
| 280 | if ( element ) |
| 281 | { |
| 282 | TiXmlElement* child = element->FirstChildElement( "Child" ); |
| 283 | if ( child ) |
| 284 | { |
| 285 | TiXmlElement* child2 = child->NextSiblingElement( "Child" ); |
| 286 | if ( child2 ) |
| 287 | { |
| 288 | // Finally do something useful. |
| 289 | @endverbatim |
| 290 | |
| 291 | Handles have been introduced to clean this up. Using the TiXmlHandle class, |
| 292 | the previous code reduces to: |
| 293 | |
| 294 | @verbatim |
| 295 | TiXmlHandle docHandle( &document ); |
| 296 | TiXmlElement* child2 = docHandle.FirstChild( "Document" ).FirstChild( "Element" ).Child( "Child", 1 ).Element(); |
| 297 | if ( child2 ) |
| 298 | { |
| 299 | // do something useful |
| 300 | @endverbatim |
| 301 | |
| 302 | Which is much easier to deal with. See TiXmlHandle for more information. |
| 303 | |
| 304 | |
| 305 | <h3> Row and Column tracking </h3> |
| 306 | Being able to track nodes and attributes back to their origin location |
| 307 | in source files can be very important for some applications. Additionally, |
| 308 | knowing where parsing errors occured in the original source can be very |
| 309 | time saving. |
| 310 | |
| 311 | TinyXml can tracks the row and column origin of all nodes and attributes |
| 312 | in a text file. The TiXmlBase::Row() and TiXmlBase::Column() methods return |
| 313 | the origin of the node in the source text. The correct tabs can be |
| 314 | configured in TiXmlDocument::SetTabSize(). |
| 315 | |
| 316 | |
| 317 | <h2> Using and Installing </h2> |
| 318 | |
| 319 | To Compile and Run xmltest: |
| 320 | |
| 321 | A Linux Makefile and a Windows Visual C++ .dsw file is provided. |
| 322 | Simply compile and run. It will write the file demotest.xml to your |
| 323 | disk and generate output on the screen. It also tests walking the |
| 324 | DOM by printing out the number of nodes found using different |
| 325 | techniques. |
| 326 | |
| 327 | The Linux makefile is very generic and will |
| 328 | probably run on other systems, but is only tested on Linux. You no |
| 329 | longer need to run 'make depend'. The dependecies have been |
| 330 | hard coded. |
| 331 | |
| 332 | <h3>Windows project file for VC6</h3> |
| 333 | <ul> |
| 334 | <li>tinyxml: tinyxml library, non-STL </li> |
| 335 | <li>tinyxmlSTL: tinyxml library, STL </li> |
| 336 | <li>tinyXmlTest: test app, non-STL </li> |
| 337 | <li>tinyXmlTestSTL: test app, STL </li> |
| 338 | </ul> |
| 339 | |
| 340 | <h3>Linux Make file</h3> |
| 341 | At the top of the makefile you can set: |
| 342 | |
| 343 | PROFILE, DEBUG, and TINYXML_USE_STL. Details (such that they are) are in |
| 344 | the makefile. |
| 345 | |
| 346 | In the tinyxml directory, type "make clean" then "make". The executable |
| 347 | file 'xmltest' will be created. |
| 348 | |
| 349 | |
| 350 | |
| 351 | <h3>To Use in an Application:</h3> |
| 352 | |
| 353 | Add tinyxml.cpp, tinyxml.h, tinyxmlerror.cpp, tinyxmlparser.cpp, tinystr.cpp, and tinystr.h to your |
| 354 | project or make file. That's it! It should compile on any reasonably |
| 355 | compliant C++ system. You do not need to enable exceptions or |
| 356 | RTTI for TinyXml. |
| 357 | |
| 358 | |
| 359 | <h2> How TinyXml works. </h2> |
| 360 | |
| 361 | An example is probably the best way to go. Take: |
| 362 | @verbatim |
| 363 | <?xml version="1.0" standalone=no> |
| 364 | <!-- Our to do list data --> |
| 365 | <ToDo> |
| 366 | <Item priority="1"> Go to the <bold>Toy store!</bold></Item> |
| 367 | <Item priority="2"> Do bills</Item> |
| 368 | </ToDo> |
| 369 | @endverbatim |
| 370 | |
| 371 | Its not much of a To Do list, but it will do. To read this file |
| 372 | (say "demo.xml") you would create a document, and parse it in: |
| 373 | @verbatim |
| 374 | TiXmlDocument doc( "demo.xml" ); |
| 375 | doc.LoadFile(); |
| 376 | @endverbatim |
| 377 | |
| 378 | And its ready to go. Now lets look at some lines and how they |
| 379 | relate to the DOM. |
| 380 | |
| 381 | @verbatim |
| 382 | <?xml version="1.0" standalone=no> |
| 383 | @endverbatim |
| 384 | |
| 385 | The first line is a declaration, and gets turned into the |
| 386 | TiXmlDeclaration class. It will be the first child of the |
| 387 | document node. |
| 388 | |
| 389 | This is the only directive/special tag parsed by by TinyXml. |
| 390 | Generally directive targs are stored in TiXmlUnknown so the |
| 391 | commands wont be lost when it is saved back to disk. |
| 392 | |
| 393 | @verbatim |
| 394 | <!-- Our to do list data --> |
| 395 | @endverbatim |
| 396 | |
| 397 | A comment. Will become a TiXmlComment object. |
| 398 | |
| 399 | @verbatim |
| 400 | <ToDo> |
| 401 | @endverbatim |
| 402 | |
| 403 | The "ToDo" tag defines a TiXmlElement object. This one does not have |
| 404 | any attributes, but does contain 2 other elements. |
| 405 | |
| 406 | @verbatim |
| 407 | <Item priority="1"> |
| 408 | @endverbatim |
| 409 | |
| 410 | Creates another TiXmlElement which is a child of the "ToDo" element. |
| 411 | This element has 1 attribute, with the name "priority" and the value |
| 412 | "1". |
| 413 | |
| 414 | Go to the |
| 415 | |
| 416 | A TiXmlText. This is a leaf node and cannot contain other nodes. |
| 417 | It is a child of the "Item" TiXmlElement. |
| 418 | |
| 419 | @verbatim |
| 420 | <bold> |
| 421 | @endverbatim |
| 422 | |
| 423 | |
| 424 | Another TiXmlElement, this one a child of the "Item" element. |
| 425 | |
| 426 | Etc. |
| 427 | |
| 428 | Looking at the entire object tree, you end up with: |
| 429 | @verbatim |
| 430 | TiXmlDocument "demo.xml" |
| 431 | TiXmlDeclaration "version='1.0'" "standalone=no" |
| 432 | TiXmlComment " Our to do list data" |
| 433 | TiXmlElement "ToDo" |
| 434 | TiXmlElement "Item" Attribtutes: priority = 1 |
| 435 | TiXmlText "Go to the " |
| 436 | TiXmlElement "bold" |
| 437 | TiXmlText "Toy store!" |
| 438 | TiXmlElement "Item" Attributes: priority=2 |
| 439 | TiXmlText "Do bills" |
| 440 | @endverbatim |
| 441 | |
| 442 | <h2> Documentation </h2> |
| 443 | |
| 444 | The documentation is build with Doxygen, using the 'dox' |
| 445 | configuration file. |
| 446 | |
| 447 | <h2> License </h2> |
| 448 | |
| 449 | TinyXml is released under the zlib license: |
| 450 | |
| 451 | This software is provided 'as-is', without any express or implied |
| 452 | warranty. In no event will the authors be held liable for any |
| 453 | damages arising from the use of this software. |
| 454 | |
| 455 | Permission is granted to anyone to use this software for any |
| 456 | purpose, including commercial applications, and to alter it and |
| 457 | redistribute it freely, subject to the following restrictions: |
| 458 | |
| 459 | 1. The origin of this software must not be misrepresented; you must |
| 460 | not claim that you wrote the original software. If you use this |
| 461 | software in a product, an acknowledgment in the product documentation |
| 462 | would be appreciated but is not required. |
| 463 | |
| 464 | 2. Altered source versions must be plainly marked as such, and |
| 465 | must not be misrepresented as being the original software. |
| 466 | |
| 467 | 3. This notice may not be removed or altered from any source |
| 468 | distribution. |
| 469 | |
| 470 | <h2> References </h2> |
| 471 | |
| 472 | The World Wide Web Consortium is the definitive standard body for |
| 473 | XML, and there web pages contain huge amounts of information. |
| 474 | |
| 475 | The definitive spec: <a href="http://www.w3.org/TR/2004/REC-xml-20040204/"> |
| 476 | http://www.w3.org/TR/2004/REC-xml-20040204/</a> |
| 477 | |
| 478 | I also recommend "XML Pocket Reference" by Robert Eckstein and published by |
| 479 | OReilly...the book that got the whole thing started. |
| 480 | |
| 481 | <h2> Contributors, Contacts, and a Brief History </h2> |
| 482 | |
| 483 | Thanks very much to everyone who sends suggestions, bugs, ideas, and |
| 484 | encouragement. It all helps, and makes this project fun. A special thanks |
| 485 | to the contributors on the web pages that keep it lively. |
| 486 | |
| 487 | So many people have sent in bugs and ideas, that rather than list here |
| 488 | we try to give credit due in the "changes.txt" file. |
| 489 | |
| 490 | TinyXml was originally written be Lee Thomason. (Often the "I" still |
| 491 | in the documenation.) Lee reviews changes and releases new versions, |
| 492 | with the help of Yves Berquin and the tinyXml community. |
| 493 | |
| 494 | We appreciate your suggestions, and would love to know if you |
| 495 | use TinyXml. Hopefully you will enjoy it and find it useful. |
| 496 | Please post questions, comments, file bugs, or contact us at: |
| 497 | |
| 498 | www.sourceforge.net/projects/tinyxml |
| 499 | |
| 500 | Lee Thomason, |
| 501 | Yves Berquin |
| 502 | */ |