Blame - docs/Notes_on_WST_StructuredDocument.txt - fp2-dev/platform/sdk

blob: dcf124d5668693a4a2feb587252a25733a13e22a [file] [log] [blame]

Raphael Moll	c8332dd	2010-11-26 21:56:50 -0800	[diff] [blame]	1	Notes on WST StructuredDocument
				2	-------------------------------
				3
Raphael Moll	28e1cc3	2010-11-29 15:30:07 -0800	[diff] [blame]	4	Created: 2010/11/26
Raphael Moll	c8332dd	2010-11-26 21:56:50 -0800	[diff] [blame]	5	References: WST 3.1.x, Eclipse 3.5 Galileo
				6
Raphael Moll	c8332dd	2010-11-26 21:56:50 -0800	[diff] [blame]	7	To manipulate XML documents in refactorings, we sometimes use the WST/SEE
				8	"StructuredDocument" API. There isn't exactly a lot of documentation on
				9	this out there, so this is a short explanation of how it works, totally
				10	based on _empirical_ evidence. As such, it must be taken with a grain of salt.
				11
Raphael Moll	28e1cc3	2010-11-29 15:30:07 -0800	[diff] [blame]	12	Examples of usage can be found in
				13	sdk/eclipse/plugins/com.android.ide.eclipse.adt/src/com/android/ide/eclipse/adt/internal/refactorings/
Raphael Moll	c8332dd	2010-11-26 21:56:50 -0800	[diff] [blame]	14
Raphael Moll	28e1cc3	2010-11-29 15:30:07 -0800	[diff] [blame]	15
Raphael Moll	c8332dd	2010-11-26 21:56:50 -0800	[diff] [blame]	16	1- Get a document instance
				17	--------------------------
				18
				19	To get a document from an existing IFile resource:
				20
				21	IModelManager modelMan = StructuredModelManager.getModelManager();
				22	IStructuredDocument sdoc = modelMan.createStructuredDocumentFor(file);
				23
				24	Note that the IStructuredDocument and all the associated interfaces we'll use
				25	below are all located in org.eclipse.wst.sse.core.internal.provisional,
				26	meaning they _might_ change later.
				27
				28	Also note that this parses the content of the file on disk, not of a buffer
				29	with pending unsaved modifications opened in an editor.
				30
				31	There is a counterpart for non-existent resources:
				32
				33	IModelManager.createNewStructuredDocumentFor(IFile)
				34
				35	However our goal so far has been to _parse_ existing documents, find
				36	the place that we wanted to modify and then generate a TextFileChange
				37	for a refactoring operation. Consequently this document doesn't say
				38	anything about using this model to modify content directly.
				39
				40
				41	2- Structured Document overview
				42	-------------------------------
				43
				44	The IStructuredDocument is organized in "regions", which are little pieces
				45	of text.
				46
				47	The document contains a list of region collections, each one being
				48	a list of regions. Each region has a type, as well as text.
				49
				50	Since we use this to parse XML, let's look at this XML example:
				51
				52	<?xml version="1.0" encoding="utf-8"?> \n
				53	<resource> \n
				54	<color/>
				55	<string name="my_string">Some Value</string> <!-- comment -->\n
				56	</resource>
				57
				58
				59	This will result in the following regions and sub-regions:
				60	(all the constants below are located in DOMRegionContext)
				61
				62	XML_PI_OPEN
				63	XML_PI_OPEN:<?
				64	XML_TAG_NAME:xml
				65	XML_TAG_ATTRIBUTE_NAME:version
				66	XML_TAG_ATTRIBUTE_EQUALS:=
				67	XML_TAG_ATTRIBUTE_VALUE:"1.0"
				68	XML_TAG_ATTRIBUTE_NAME:encoding
				69	XML_TAG_ATTRIBUTE_EQUALS:=
				70	XML_TAG_ATTRIBUTE_VALUE:"utf-8"
				71	XML_PI_CLOSE:?>
				72
				73	XML_CONTENT
				74	XML_CONTENT:\n
				75
				76	XML_TAG_NAME
				77	XML_TAG_OPEN:<
				78	XML_TAG_NAME:resources
				79	XML_TAG_CLOSE:>
				80
				81	XML_CONTENT
				82	XML_CONTENT:\n + whitespace before color
				83
				84	XML_TAG_NAME
				85	XML_TAG_OPEN:<
				86	XML_TAG_NAME:color
				87	XML_EMPTY_TAG_CLOSE:/>
				88
				89	XML_CONTENT
				90	XML_CONTENT:\n + whitespace before string
				91
				92	XML_TAG_NAME
				93	XML_TAG_OPEN:<
				94	XML_TAG_NAME:string
				95	XML_TAG_ATTRIBUTE_NAME:name
				96	XML_TAG_ATTRIBUTE_EQUALS:=
				97	XML_TAG_ATTRIBUTE_VALUE:"my_string"
				98	XML_TAG_CLOSE:>
				99
				100	XML_CONTENT
				101	XML_CONTENT:Some Value
				102
				103	XML_TAG_NAME
				104	XML_END_TAG_OPEN:</
				105	XML_TAG_NAME:string
				106	XML_TAG_CLOSE:>
				107
				108	XML_CONTENT
				109	XML_CONTENT: (2 spaces before the comment)
				110
				111	XML_COMMENT_TEXT
				112	XML_COMMENT_OPEN:<!--
				113	XML_COMMENT_TEXT: comment
				114	XML_COMMENT_CLOSE:--
				115
				116	XML_CONTENT
				117	XML_CONTENT: \n after comment
				118
				119	XML_TAG_NAME
				120	XML_END_TAG_OPEN:</
				121	XML_TAG_NAME:resources
				122	XML_TAG_CLOSE:>
				123
				124	XML_CONTENT
				125	XML_CONTENT:
				126
				127
				128	3- Iterating through regions
				129	----------------------------
				130
				131	To iterate through all regions, we need to process the list of top-level regions and then
				132	iterate over inner regions:
				133
				134	for (IStructuredDocumentRegion regions : sdoc.getStructuredDocumentRegions()) {
				135	// process inner regions
				136	for (int i = 0; i < regions.getNumberOfRegions(); i++) {
				137	ITextRegion region = regions.getRegions().get(i);
				138	String type = region.getType();
				139	String text = regions.getText(region);
				140	}
				141	}
				142
				143	Each "region collection" basically matches one XML tag, with sub-regions for all the tokens
				144	inside a tag.
				145
				146	Note that an XML_CONTENT region is actually the whitespace, was is known as a TEXT in the w3c DOM.
				147
				148	Also note that each outer region has a type, but the inner regions also reuse a similar type.
				149	So for example an outer XML_TAG_NAME region collection is a proper XML tag, and it will contain
				150	an opening tag, a closing tag but also an XML_TAG_NAME that is the tag name itself.
				151
				152	Surprisingly, the inner regions do not have many access methods we can use on them, except their
				153	type and start/length/end. There are two length and end methods:
				154	- getLength() and getEnd() take any whitespace into account.
				155	- getTextLength() and getTextEnd() exclude some typical trailing whitespace.
				156
				157	Note that regarding the trailing whitespace, empirical evidence shows that in the XML case
				158	here, the only case where it matters is in a tag such as <string name="my_string">: for the
				159	XML_TAG_NAME region, getLength is 7 (string + space) and getTextLength is 6 (string, no space).
				160	Spacing between XML element is its own collapsed region.
				161
				162	If you want the text of the inner region, you actually need to query it from the outer region.
				163	The outer IStructuredDocumentRegion (the region collection) contains lots more useful access
				164	methods, some of which return details on the inner regions:
				165	- getText : without the whitespace.
				166	- getFullText : with the whitespace.
				167	- getStart / getLength / getEnd : type-dependent offset, including whitespace.
				168	- getStart / getTextLength / getTextEnd : type-dependent offset, excluding "irrelevant" whitespace.
				169	- getStartOffset / getEndOffset / getTextEndOffset : relative to document.
				170
				171	Empirical evidence shows that there is no discernible difference between the getStart/getEnd
				172	values and those returned by getStartOffset/getEndOffset. Please abide by the javadoc.
				173
				174	All offsets start at zero.
				175
				176	Given a region collection, you can also browse regions either using a getRegions() list, or
				177	using getFirst/getLastRegion, or using getRegionAtCharacterOffset(). Iterating the region
				178	list seems the most useful scenario. There's no actual iterator provided for inner regions.
				179
				180	There are a few other methods available in the regions classes. This was not an exhaustive list.
				181
				182
				183	----