Blame - doc/xml.html - platform/external/libxml2

blob: 5500349c57121259e963775d068beace54f2b2a6 [file] [log] [blame]

Daniel Veillard	ccb0963	1998-10-27 06:21:04 +0000	[diff] [blame]	1	<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
				2	"http://www.w3.org/TR/REC-html40/loose.dtd">
				3	<html>
				4	<head>
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	5	<title>The XML library for Gnome</title>
Daniel Veillard	ccb0963	1998-10-27 06:21:04 +0000	[diff] [blame]	6	<meta name="GENERATOR" content="amaya V1.3b">
				7	</head>
				8	<body bgcolor="#ffffff">
				9
				10	<h1 align="center">The XML library for Gnome</h1>
				11	<p>
				12	This document describes the <a href="http://www.w3.org/XML/">XML</a> library
				13	provideed in the <a href="http://www.gnome.org/">Gnome</a> framework. XML is a
				14	standard to build tag based structured documents. The internal document
				15	repesentation is as close as possible to the <a
				16	href="http://www.w3.org/DOM/">DOM</a> interfaces.</p>
				17
				18	<h2>xml</h2>
				19	<p>
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	20	XML is a standard for markup based structured documents, here is <a
				21	name="example">an example</a>:</p>
Daniel Veillard	ccb0963	1998-10-27 06:21:04 +0000	[diff] [blame]	22	<pre><?xml version="1.0"?>
				23	<EXAMPLE prop1="gnome is great" prop2="&linux; too">
				24	<head>
				25	<title>Welcome to Gnome</title>
				26	</head>
				27	<chapter>
				28	<title>The Linux adventure</title>
				29	<p>bla bla bla ...</p>
				30	<image href="linus.gif"/>
				31	<p>...</p>
				32	</chapter>
				33	</EXAMPLE></pre>
				34	<p>
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	35	The first line specify that it's an XML document and gives useful informations
				36	about it's encoding. Then the document is a text format whose structure is
				37	specified by tags between brackets. <strong>Each tag opened have to be
				38	closed</strong> XML is pedantic about this, not that for example the image
				39	tage has no content (just an attribute) and is closed by ending up the tag
				40	with <code>/></code>.</p>
Daniel Veillard	ccb0963	1998-10-27 06:21:04 +0000	[diff] [blame]	41
				42	<h2>The tree output</h2>
				43	<p>
				44	The parser returns a tree built during the document analysis. The value
				45	returned is an <strong>xmlDocPtr</strong> (i.e. a pointer to an
				46	<strong>xmlDoc</strong> structure). This structure contains informations like
				47	the file name, the document type, and a <strong>root</strong> pointer which
				48	is the root of the document (or more exactly the first child under the root
				49	which is the document). The tree is made of <strong>xmlNode</strong>s, chained
				50	in double linked lists of siblings and with childs<->parent relationship.
				51	An xmlNode can also carry properties (a chain of xmlAttr structures). An
				52	attribute may have a value which is a list of TEXT or ENTITY_REF nodes.</p>
				53	<p>
				54	Here is an example (erroneous w.r.t. the XML spec since there should be only
				55	one ELEMENT under the root):</p>
				56	<p>
				57	<img src="structure.gif" alt=" structure.gif "></p>
				58	<p>
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	59	In the source package there is a small program (not installed by default)
				60	called <strong>tester</strong> which parses XML files given as argument and
				61	prints them back as parsed, this is useful to detect errors both in XML code
				62	and in the XML parser itself. It has an option <strong>--debug</strong> which
				63	prints the actual in-memory structure of the document, here is the result with
				64	the <a href="#example">example</a> given before:</p>
				65	<pre>DOCUMENT
				66	version=1.0
				67	standalone=true
				68	ELEMENT EXAMPLE
				69	ATTRIBUTE prop1
				70	TEXT
				71	content=gnome is great
				72	ATTRIBUTE prop2
				73	ENTITY_REF
				74	TEXT
				75	content= too
				76	ELEMENT head
				77	ELEMENT title
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	78	TEXT
				79	content=Welcome to Gnome
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	80	ELEMENT chapter
				81	ELEMENT title
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	82	TEXT
				83	content=The Linux adventure
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	84	ELEMENT p
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	85	TEXT
				86	content=bla bla bla ...
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	87	ELEMENT image
				88	ATTRIBUTE href
				89	TEXT
				90	content=linus.gif
				91	ELEMENT p
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	92	TEXT
				93	content=...</pre>
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	94	<p>
				95	This should be useful to learn the internal representation model.</p>
Daniel Veillard	ccb0963	1998-10-27 06:21:04 +0000	[diff] [blame]	96
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	97	<h2>The XML library interfaces</h2>
				98	<p>
				99	This section is directly intended to help programmers getting bootstrapped
				100	using the XML library from the C language. It doesn't intent to be extensive,
				101	I hope the automatically generated docs will provide the completeness
				102	required, but as a separated set of documents. The interfaces of the XML
				103	library are by principle low level, there is nearly zero abstration. Those
				104	interested in a higher level API should <a href="#DOM">look at DOM</a>
				105	(unfortunately not completed).</p>
Daniel Veillard	ccb0963	1998-10-27 06:21:04 +0000	[diff] [blame]	106
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	107	<h3>Invoking the parser</h3>
				108	<p>
				109	Usually, the first thing to do is to read an XML input, the parser accepts to
				110	parse both memory mapped documents or direct files. The functions are defined
				111	in "parser.h":</p>
				112	<dl>
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	113	<dt><code>xmlDocPtr xmlParseMemory(char *buffer, int size);</code></dt>
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	114	<dd><p>
				115	parse a zero terminated string containing the document</p>
				116	</dd>
				117	</dl>
				118	<dl>
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	119	<dt><code>xmlDocPtr xmlParseFile(const char *filename);</code></dt>
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	120	<dd><p>
				121	parse an XML document contained in a file (possibly compressed)</p>
				122	</dd>
				123	</dl>
				124	<p>
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	125	This returns a pointer to the document structure (or NULL in case of
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	126	failure).</p>
				127	<p>
				128	A couple of comments can be made, first this mean that the parser is
				129	memory-hungry, first to load the document in memory, second to build the tree.
				130	Reading a document without building the tree will be possible in the future by
				131	pluggin the code to the SAX interface (see SAX.c).</p>
Daniel Veillard	ccb0963	1998-10-27 06:21:04 +0000	[diff] [blame]	132
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	133	<h3>Building a tree from scratch</h3>
				134	<p>
				135	The other way to get an XML tree in memory is by building it. Basically there
				136	is a set of functions dedicated to building new elements, those are also
				137	described in "tree.h", here is for example the piece of code producing the
				138	example used before:</p>
				139	<pre> xmlDocPtr doc;
				140	xmlNodePtr tree, subtree;
				141
				142	doc = xmlNewDoc("1.0");
				143	doc->root = xmlNewDocNode(doc, NULL, "EXAMPLE", NULL);
				144	xmlSetProp(doc->root, "prop1", "gnome is great");
				145	xmlSetProp(doc->root, "prop2", "&linux; too");
				146	tree = xmlNewChild(doc->root, NULL, "head", NULL);
				147	subtree = xmlNewChild(tree, NULL, "title", "Welcome to Gnome");
				148	tree = xmlNewChild(doc->root, NULL, "chapter", NULL);
				149	subtree = xmlNewChild(tree, NULL, "title", "The Linux adventure");
				150	subtree = xmlNewChild(tree, NULL, "p", "bla bla bla ...");
				151	subtree = xmlNewChild(tree, NULL, "image", NULL);
				152	xmlSetProp(subtree, "href", "linus.gif");</pre>
				153	<p>
				154	Not really rocket science ...</p>
				155
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	156	<h3>Traversing the tree</h3>
				157	<p>
				158	Basically by including "tree.h" your code has access to the internal structure
				159	of all the element of the tree. The names should be somewhat simple like
				160	<strong>parent</strong>, <strong>childs</strong>, <strong>next</strong>,
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	161	<strong>prev</strong>, <strong>properties</strong>, etc... For example still
				162	with the previous example:</p>
				163	<pre><code>doc->root->childs->childs</code></pre>
				164	<p>
				165	points to the title element,</p>
				166	<pre>doc->root->childs->next->child->child</pre>
				167	<p>
				168	points to the text node containing the chapter titlle "The Linux adventure"
				169	and</p>
				170	<pre>doc->root->properties->next->val</pre>
				171	<p>
				172	points to the entity reference containing the value of "&linux" at the
				173	beginning of the second attribute of the root element "EXAMPLE".</p>
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	174
				175	<h3>Modifying the tree</h3>
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	176	<p>
				177	functions are provided to read and write the document content:</p>
				178	<dl>
				179	<dt><code>xmlAttrPtr xmlSetProp(xmlNodePtr node, const CHAR *name, const CHAR
				180	*value);</code></dt>
				181	<dd><p>
				182	This set (or change) an attribute carried by an ELEMENT node the value can be
				183	NULL</p>
				184	</dd>
				185	</dl>
				186	<dl>
				187	<dt><code>const CHAR *xmlGetProp(xmlNodePtr node, const CHAR
				188	*name);</code></dt>
				189	<dd><p>
				190	This function returns a pointer to the property content, note that no extra
				191	copy is made</p>
				192	</dd>
				193	</dl>
				194	<p>
				195	Two functions must be used to read an write the text associated to
				196	elements:</p>
				197	<dl>
				198	<dt><code>xmlNodePtr xmlStringGetNodeList(xmlDocPtr doc, const CHAR
				199	*value);</code></dt>
				200	<dd><p>
				201	This function takes an "external" string and convert it to one text node or
				202	possibly to a list of entity and text nodes. All non-predefined entity
				203	references like &Gnome; will be stored internally as an entity node, hence
				204	the result of the function may not be a single node.</p>
				205	</dd>
				206	</dl>
				207	<dl>
				208	<dt><code>CHAR *xmlNodeListGetString(xmlDocPtr doc, xmlNodePtr list, int
				209	inLine);</code></dt>
				210	<dd><p>
				211	this is the dual function, which generate a new string containing the content
				212	of the text and entity nodes. Note the extra argument inLine, if set to 1
				213	instead of returning the &Gnome; XML encoding in the string it will
				214	substitute it with it's value say "GNU Network Object Model Environment". Set
				215	it if you want to use the string for non XML usage like User Interface.</p>
				216	</dd>
				217	</dl>
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	218
				219	<h3>Saving a tree</h3>
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	220	<p>
				221	Basically 3 options are possible:</p>
				222	<dl>
				223	<dt><code>void xmlDocDumpMemory(xmlDocPtr cur, CHAR**mem, int
				224	*size);</code></dt>
				225	<dd><p>
				226	returns a buffer where the document has been saved</p>
				227	</dd>
				228	</dl>
				229	<dl>
				230	<dt><code>extern void xmlDocDump(FILE *f, xmlDocPtr doc);</code></dt>
				231	<dd><p>
				232	dumps a buffer to an open file descriptor</p>
				233	</dd>
				234	</dl>
				235	<dl>
				236	<dt><code>int xmlSaveFile(const char *filename, xmlDocPtr cur);</code></dt>
				237	<dd><p>
				238	save the document ot a file. In that case the compression interface is
				239	triggered if turned on</p>
				240	</dd>
				241	</dl>
Daniel Veillard	10c6a8f	1998-10-28 01:00:12 +0000	[diff] [blame]	242
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	243	<h3>Compression</h3>
				244	<p>
				245	The library handle transparently compression when doing file based accesses,
				246	the level of compression on saves can be tuned either globally or individually
				247	for one file:</p>
				248	<dl>
				249	<dt><code>int xmlGetDocCompressMode (xmlDocPtr doc);</code></dt>
				250	<dd><p>
				251	Get the document compression ratio (0-9)</p>
				252	</dd>
				253	</dl>
				254	<dl>
				255	<dt><code>void xmlSetDocCompressMode (xmlDocPtr doc, int mode);</code></dt>
				256	<dd><p>
				257	Set the document compression ratio</p>
				258	</dd>
				259	</dl>
				260	<dl>
				261	<dt><code>int xmlGetCompressMode(void);</code></dt>
				262	<dd><p>
				263	Get the default compression ratio</p>
				264	</dd>
				265	</dl>
				266	<dl>
				267	<dt><code>void xmlSetCompressMode(int mode);</code></dt>
				268	<dd><p>
				269	set the default compression ratio</p>
				270	</dd>
				271	</dl>
				272
				273	<h2><a name="DOM">DOM Principles</a></h2>
Daniel Veillard	ccb0963	1998-10-27 06:21:04 +0000	[diff] [blame]	274	<p>
				275	<a href="http://www.w3.org/DOM/">DOM</a> stands for the <em>Document Object
				276	Model</em> this is an API for accessing XML or HTML structured documents.
				277	Native support for DOM in Gnome is on the way (module gnome-dom), and it will
Daniel Veillard	25940b7	1998-10-29 05:51:30 +0000	[diff] [blame]	278	be based on gnome-xml. This will be a far cleaner interface to manipulate XML
				279	files within Gnome since it won't expose the internal structure. DOM defiles a
				280	set of IDL (or Java) interfaces allowing to traverse and manipulate a
				281	document. The DOM library will allow accessing and modifying "live" documents
				282	presents on other programs like this:</p>
Daniel Veillard	ccb0963	1998-10-27 06:21:04 +0000	[diff] [blame]	283	<p>
				284	<img src="DOM.gif" alt=" DOM.gif "></p>
				285	<p>
				286	This should help greatly doing things like modifying a gnumeric spreadsheet
				287	embedded in a GWP document for example.</p>
				288	<p>
				289	</p>
				290	</body>
				291	</html>