| page.title=Parsing XML Data |
| parent.title=Performing Network Operations |
| parent.link=index.html |
| |
| trainingnavtop=true |
| |
| previous.title=Managing Network Usage |
| previous.link=managing.html |
| |
| @jd:body |
| |
| <div id="tb-wrapper"> |
| <div id="tb"> |
| |
| |
| |
| <h2>This lesson teaches you to</h2> |
| <ol> |
| <li><a href="#choose">Choose a Parser</a></li> |
| <li><a href="#analyze">Analyze the Feed</a></li> |
| <li><a href="#instantiate">Instantiate the Parser</a></li> |
| <li><a href="#read">Read the Feed</a></li> |
| <li><a href="#parse">Parse XML</a></li> |
| <li><a href="#skip">Skip Tags You Don't Care About</a></li> |
| <li><a href="#consume">Consume XML Data</a></li> |
| </ol> |
| |
| <h2>You should also read</h2> |
| <ul> |
| <li><a href="{@docRoot}guide/webapps/index.html">Web Apps Overview</a></li> |
| </ul> |
| |
| <h2>Try it out</h2> |
| |
| <div class="download-box"> |
| <a href="{@docRoot}shareables/training/NetworkUsage.zip" |
| class="button">Download the sample</a> |
| <p class="filename">NetworkUsage.zip</p> |
| </div> |
| |
| </div> |
| </div> |
| |
| <p>Extensible Markup Language (XML) is a set of rules for encoding documents in |
| machine-readable form. XML is a popular format for sharing data on the internet. |
| Websites that frequently update their content, such as news sites or blogs, |
| often provide an XML feed so that external programs can keep abreast of content |
| changes. Uploading and parsing XML data is a common task for network-connected |
| apps. This lesson explains how to parse XML documents and use their data.</p> |
| |
| <h2 id="choose">Choose a Parser</h2> |
| |
| <p>We recommend {@link org.xmlpull.v1.XmlPullParser}, which is an efficient and |
| maintainable way to parse XML on Android. Historically Android has had two |
| implementations of this interface:</p> |
| |
| <ul> |
| <li><a href="http://kxml.sourceforge.net/"><code>KXmlParser</code></a> |
| via {@link org.xmlpull.v1.XmlPullParserFactory#newPullParser XmlPullParserFactory.newPullParser()}. |
| </li> |
| <li><code>ExpatPullParser</code>, via |
| {@link android.util.Xml#newPullParser Xml.newPullParser()}. |
| </li> |
| </ul> |
| |
| <p>Either choice is fine. The |
| example in this section uses <code>ExpatPullParser</code>, via |
| {@link android.util.Xml#newPullParser Xml.newPullParser()}. </p> |
| |
| <h2 id="analyze">Analyze the Feed</h2> |
| |
| <p>The first step in parsing a feed is to decide which fields you're interested in. |
| The parser extracts data for those fields and ignores the rest.</p> |
| |
| <p>Here is an excerpt from the feed that's being parsed in the sample app. Each |
| post to <a href="http://stackoverflow.com">StackOverflow.com</a> appears in the |
| feed as an <code>entry</code> tag that contains several nested tags:</p> |
| |
| <pre><?xml version="1.0" encoding="utf-8"?> |
| <feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" ..."> |
| <title type="text">newest questions tagged android - Stack Overflow</title> |
| ... |
| <entry> |
| ... |
| </entry> |
| <entry> |
| <id>http://stackoverflow.com/q/9439999</id> |
| <re:rank scheme="http://stackoverflow.com">0</re:rank> |
| <title type="text">Where is my data file?</title> |
| <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="android"/> |
| <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="file"/> |
| <author> |
| <name>cliff2310</name> |
| <uri>http://stackoverflow.com/users/1128925</uri> |
| </author> |
| <link rel="alternate" href="http://stackoverflow.com/questions/9439999/where-is-my-data-file" /> |
| <published>2012-02-25T00:30:54Z</published> |
| <updated>2012-02-25T00:30:54Z</updated> |
| <summary type="html"> |
| <p>I have an Application that requires a data file...</p> |
| |
| </summary> |
| </entry> |
| <entry> |
| ... |
| </entry> |
| ... |
| </feed></pre> |
| |
| <p>The sample app |
| extracts data for the <code>entry</code> tag and its nested tags |
| <code>title</code>, <code>link</code>, and <code>summary</code>.</p> |
| |
| |
| <h2 id="instantiate">Instantiate the Parser</h2> |
| |
| <p>The next step is to |
| instantiate a parser and kick off the parsing process. In this snippet, a parser |
| is initialized to not process namespaces, and to use the provided {@link |
| java.io.InputStream} as its input. It starts the parsing process with a call to |
| {@link org.xmlpull.v1.XmlPullParser#nextTag() nextTag()} and invokes the |
| <code>readFeed()</code> method, which extracts and processes the data the app is |
| interested in:</p> |
| |
| <pre>public class StackOverflowXmlParser { |
| // We don't use namespaces |
| private static final String ns = null; |
| |
| public List<Entry> parse(InputStream in) throws XmlPullParserException, IOException { |
| try { |
| XmlPullParser parser = Xml.newPullParser(); |
| parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false); |
| parser.setInput(in, null); |
| parser.nextTag(); |
| return readFeed(parser); |
| } finally { |
| in.close(); |
| } |
| } |
| ... |
| }</pre> |
| |
| <h2 id="read">Read the Feed</h2> |
| |
| <p>The <code>readFeed()</code> method does the actual work of processing the |
| feed. It looks for elements tagged "entry" as a starting point for recursively |
| processing the feed. If a tag isn't an {@code entry} tag, it skips it. Once the whole |
| feed has been recursively processed, <code>readFeed()</code> returns a {@link |
| java.util.List} containing the entries (including nested data members) it |
| extracted from the feed. This {@link java.util.List} is then returned by the |
| parser.</p> |
| |
| <pre> |
| private List<Entry> readFeed(XmlPullParser parser) throws XmlPullParserException, IOException { |
| List<Entry> entries = new ArrayList<Entry>(); |
| |
| parser.require(XmlPullParser.START_TAG, ns, "feed"); |
| while (parser.next() != XmlPullParser.END_TAG) { |
| if (parser.getEventType() != XmlPullParser.START_TAG) { |
| continue; |
| } |
| String name = parser.getName(); |
| // Starts by looking for the entry tag |
| if (name.equals("entry")) { |
| entries.add(readEntry(parser)); |
| } else { |
| skip(parser); |
| } |
| } |
| return entries; |
| }</pre> |
| |
| |
| <h2 id="parse">Parse XML</h2> |
| |
| |
| <p>The steps for parsing an XML feed are as follows:</p> |
| <ol> |
| |
| <li>As described in <a href="#analyze">Analyze the Feed</a>, identify the tags you want to include in your app. This |
| example extracts data for the <code>entry</code> tag and its nested tags |
| <code>title</code>, <code>link</code>, and <code>summary</code>.</li> |
| |
| <li>Create the following methods:</p> |
| |
| <ul> |
| |
| <li>A "read" method for each tag you're interested in. For example, |
| <code>readEntry()</code>, <code>readTitle()</code>, and so on. The parser reads |
| tags from the input stream. When it encounters a tag named <code>entry</code>, |
| <code>title</code>, |
| <code>link</code> or <code>summary</code>, it calls the appropriate method |
| for that tag. Otherwise, it skips the tag. |
| </li> |
| |
| <li>Methods to extract data for each different type of tag and to advance the |
| parser to the next tag. For example: |
| <ul> |
| |
| <li>For the <code>title</code> and <code>summary</code> tags, the parser calls |
| <code>readText()</code>. This method extracts data for these tags by calling |
| <code>parser.getText()</code>.</li> |
| |
| <li>For the <code>link</code> tag, the parser extracts data for links by first |
| determining if the link is the kind |
| it's interested in. Then it uses <code>parser.getAttributeValue()</code> to |
| extract the link's value.</li> |
| |
| <li>For the <code>entry</code> tag, the parser calls <code>readEntry()</code>. |
| This method parses the entry's nested tags and returns an <code>Entry</code> |
| object with the data members <code>title</code>, <code>link</code>, and |
| <code>summary</code>.</li> |
| |
| </ul> |
| </li> |
| <li>A helper <code>skip()</code> method that's recursive. For more discussion of this topic, see <a href="#skip">Skip Tags You Don't Care About</a>.</li> |
| </ul> |
| |
| </li> |
| </ol> |
| |
| <p>This snippet shows how the parser parses entries, titles, links, and summaries.</p> |
| <pre>public static class Entry { |
| public final String title; |
| public final String link; |
| public final String summary; |
| |
| private Entry(String title, String summary, String link) { |
| this.title = title; |
| this.summary = summary; |
| this.link = link; |
| } |
| } |
| |
| // Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off |
| // to their respective "read" methods for processing. Otherwise, skips the tag. |
| private Entry readEntry(XmlPullParser parser) throws XmlPullParserException, IOException { |
| parser.require(XmlPullParser.START_TAG, ns, "entry"); |
| String title = null; |
| String summary = null; |
| String link = null; |
| while (parser.next() != XmlPullParser.END_TAG) { |
| if (parser.getEventType() != XmlPullParser.START_TAG) { |
| continue; |
| } |
| String name = parser.getName(); |
| if (name.equals("title")) { |
| title = readTitle(parser); |
| } else if (name.equals("summary")) { |
| summary = readSummary(parser); |
| } else if (name.equals("link")) { |
| link = readLink(parser); |
| } else { |
| skip(parser); |
| } |
| } |
| return new Entry(title, summary, link); |
| } |
| |
| // Processes title tags in the feed. |
| private String readTitle(XmlPullParser parser) throws IOException, XmlPullParserException { |
| parser.require(XmlPullParser.START_TAG, ns, "title"); |
| String title = readText(parser); |
| parser.require(XmlPullParser.END_TAG, ns, "title"); |
| return title; |
| } |
| |
| // Processes link tags in the feed. |
| private String readLink(XmlPullParser parser) throws IOException, XmlPullParserException { |
| String link = ""; |
| parser.require(XmlPullParser.START_TAG, ns, "link"); |
| String tag = parser.getName(); |
| String relType = parser.getAttributeValue(null, "rel"); |
| if (tag.equals("link")) { |
| if (relType.equals("alternate")){ |
| link = parser.getAttributeValue(null, "href"); |
| parser.nextTag(); |
| } |
| } |
| parser.require(XmlPullParser.END_TAG, ns, "link"); |
| return link; |
| } |
| |
| // Processes summary tags in the feed. |
| private String readSummary(XmlPullParser parser) throws IOException, XmlPullParserException { |
| parser.require(XmlPullParser.START_TAG, ns, "summary"); |
| String summary = readText(parser); |
| parser.require(XmlPullParser.END_TAG, ns, "summary"); |
| return summary; |
| } |
| |
| // For the tags title and summary, extracts their text values. |
| private String readText(XmlPullParser parser) throws IOException, XmlPullParserException { |
| String result = ""; |
| if (parser.next() == XmlPullParser.TEXT) { |
| result = parser.getText(); |
| parser.nextTag(); |
| } |
| return result; |
| } |
| ... |
| }</pre> |
| |
| <h2 id="skip">Skip Tags You Don't Care About</h2> |
| |
| <p>One of the steps in the XML parsing described above is for the parser to skip tags it's not interested in. Here is the parser's <code>skip()</code> method:</p> |
| |
| <pre> |
| private void skip(XmlPullParser parser) throws XmlPullParserException, IOException { |
| if (parser.getEventType() != XmlPullParser.START_TAG) { |
| throw new IllegalStateException(); |
| } |
| int depth = 1; |
| while (depth != 0) { |
| switch (parser.next()) { |
| case XmlPullParser.END_TAG: |
| depth--; |
| break; |
| case XmlPullParser.START_TAG: |
| depth++; |
| break; |
| } |
| } |
| } |
| </pre> |
| |
| <p>This is how it works:</p> |
| |
| <ul> |
| |
| <li>It throws an exception if the current event isn't a |
| <code>START_TAG</code>.</li> |
| |
| <li>It consumes the <code>START_TAG</code>, and all events up to and including |
| the matching <code>END_TAG</code>.</li> |
| |
| <li>To make sure that it stops at the correct <code>END_TAG</code> and not at |
| the first tag it encounters after the original <code>START_TAG</code>, it keeps |
| track of the nesting depth.</li> |
| |
| </ul> |
| |
| <p>Thus if the current element has nested elements, the value of |
| <code>depth</code> won't be 0 until the parser has consumed all events between |
| the original <code>START_TAG</code> and its matching <code>END_TAG</code>. For |
| example, consider how the parser skips the <code><author></code> element, |
| which has 2 nested elements, <code><name></code> and |
| <code><uri></code>:</p> |
| |
| <ul> |
| |
| <li>The first time through the <code>while</code> loop, the next tag the parser |
| encounters after <code><author></code> is the <code>START_TAG</code> for |
| <code><name></code>. The value for <code>depth</code> is incremented to |
| 2.</li> |
| |
| <li>The second time through the <code>while</code> loop, the next tag the parser |
| encounters is the <code>END_TAG</code> <code></name></code>. The value |
| for <code>depth</code> is decremented to 1.</li> |
| |
| <li>The third time through the <code>while</code> loop, the next tag the parser |
| encounters is the <code>START_TAG</code> <code><uri></code>. The value |
| for <code>depth</code> is incremented to 2.</li> |
| |
| <li>The fourth time through the <code>while</code> loop, the next tag the parser |
| encounters is the <code>END_TAG</code> <code></uri></code>. The value for |
| <code>depth</code> is decremented to 1.</li> |
| |
| <li>The fifth time and final time through the <code>while</code> loop, the next |
| tag the parser encounters is the <code>END_TAG</code> |
| <code></author></code>. The value for <code>depth</code> is decremented to |
| 0, indicating that the <code><author></code> element has been successfully |
| skipped.</li> |
| |
| </ul> |
| |
| <h2 id="consume">Consume XML Data</h2> |
| |
| <p>The example application fetches and parses the XML feed within an {@link |
| android.os.AsyncTask}. This takes the processing off the main UI thread. When |
| processing is complete, the app updates the UI in the main activity |
| (<code>NetworkActivity</code>).</p> |
| <p>In the excerpt shown below, the <code>loadPage()</code> method does the |
| following:</p> |
| |
| <ul> |
| |
| <li>Initializes a string variable with the URL for the XML feed.</li> |
| |
| <li>If the user's settings and the network connection allow it, invokes |
| <code>new DownloadXmlTask().execute(url)</code>. This instantiates a new |
| <code>DownloadXmlTask</code> object ({@link android.os.AsyncTask} subclass) and |
| runs its {@link android.os.AsyncTask#execute execute()} method, which downloads |
| and parses the feed and returns a string result to be displayed in the UI.</li> |
| |
| </ul> |
| <pre> |
| public class NetworkActivity extends Activity { |
| public static final String WIFI = "Wi-Fi"; |
| public static final String ANY = "Any"; |
| private static final String URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest"; |
| |
| // Whether there is a Wi-Fi connection. |
| private static boolean wifiConnected = false; |
| // Whether there is a mobile connection. |
| private static boolean mobileConnected = false; |
| // Whether the display should be refreshed. |
| public static boolean refreshDisplay = true; |
| public static String sPref = null; |
| |
| ... |
| |
| // Uses AsyncTask to download the XML feed from stackoverflow.com. |
| public void loadPage() { |
| |
| if((sPref.equals(ANY)) && (wifiConnected || mobileConnected)) { |
| new DownloadXmlTask().execute(URL); |
| } |
| else if ((sPref.equals(WIFI)) && (wifiConnected)) { |
| new DownloadXmlTask().execute(URL); |
| } else { |
| // show error |
| } |
| }</pre> |
| |
| <p>The {@link android.os.AsyncTask} subclass shown below, |
| <code>DownloadXmlTask</code>, implements the following {@link |
| android.os.AsyncTask} methods:</p> |
| |
| <ul> |
| |
| <li>{@link android.os.AsyncTask#doInBackground doInBackground()} executes |
| the method <code>loadXmlFromNetwork()</code>. It passes the feed URL as a |
| parameter. The method <code>loadXmlFromNetwork()</code> fetches and processes |
| the feed. When it finishes, it passes back a result string.</li> |
| |
| <li>{@link android.os.AsyncTask#onPostExecute onPostExecute()} takes the |
| returned string and displays it in the UI.</li> |
| |
| </ul> |
| |
| <pre> |
| // Implementation of AsyncTask used to download XML feed from stackoverflow.com. |
| private class DownloadXmlTask extends AsyncTask<String, Void, String> { |
| @Override |
| protected String doInBackground(String... urls) { |
| try { |
| return loadXmlFromNetwork(urls[0]); |
| } catch (IOException e) { |
| return getResources().getString(R.string.connection_error); |
| } catch (XmlPullParserException e) { |
| return getResources().getString(R.string.xml_error); |
| } |
| } |
| |
| @Override |
| protected void onPostExecute(String result) { |
| setContentView(R.layout.main); |
| // Displays the HTML string in the UI via a WebView |
| WebView myWebView = (WebView) findViewById(R.id.webview); |
| myWebView.loadData(result, "text/html", null); |
| } |
| }</pre> |
| |
| <p>Below is the method <code>loadXmlFromNetwork()</code> that is invoked from |
| <code>DownloadXmlTask</code>. It does the following:</p> |
| |
| <ol> |
| |
| <li>Instantiates a <code>StackOverflowXmlParser</code>. It also creates variables for |
| a {@link java.util.List} of <code>Entry</code> objects (<code>entries</code>), and |
| <code>title</code>, <code>url</code>, and <code>summary</code>, to hold the |
| values extracted from the XML feed for those fields.</li> |
| |
| <li>Calls <code>downloadUrl()</code>, which fetches the feed and returns it as |
| an {@link java.io.InputStream}.</li> |
| |
| <li>Uses <code>StackOverflowXmlParser</code> to parse the {@link java.io.InputStream}. |
| <code>StackOverflowXmlParser</code> populates a |
| {@link java.util.List} of <code>entries</code> with data from the feed.</li> |
| |
| <li>Processes the <code>entries</code> {@link java.util.List}, |
| and combines the feed data with HTML markup.</li> |
| |
| <li>Returns an HTML string that is displayed in the main activity |
| UI by the {@link android.os.AsyncTask} method {@link |
| android.os.AsyncTask#onPostExecute onPostExecute()}.</li> |
| |
| </ol> |
| |
| <pre> |
| // Uploads XML from stackoverflow.com, parses it, and combines it with |
| // HTML markup. Returns HTML string. |
| private String loadXmlFromNetwork(String urlString) throws XmlPullParserException, IOException { |
| InputStream stream = null; |
| // Instantiate the parser |
| StackOverflowXmlParser stackOverflowXmlParser = new StackOverflowXmlParser(); |
| List<Entry> entries = null; |
| String title = null; |
| String url = null; |
| String summary = null; |
| Calendar rightNow = Calendar.getInstance(); |
| DateFormat formatter = new SimpleDateFormat("MMM dd h:mmaa"); |
| |
| // Checks whether the user set the preference to include summary text |
| SharedPreferences sharedPrefs = PreferenceManager.getDefaultSharedPreferences(this); |
| boolean pref = sharedPrefs.getBoolean("summaryPref", false); |
| |
| StringBuilder htmlString = new StringBuilder(); |
| htmlString.append("<h3>" + getResources().getString(R.string.page_title) + "</h3>"); |
| htmlString.append("<em>" + getResources().getString(R.string.updated) + " " + |
| formatter.format(rightNow.getTime()) + "</em>"); |
| |
| try { |
| stream = downloadUrl(urlString); |
| entries = stackOverflowXmlParser.parse(stream); |
| // Makes sure that the InputStream is closed after the app is |
| // finished using it. |
| } finally { |
| if (stream != null) { |
| stream.close(); |
| } |
| } |
| |
| // StackOverflowXmlParser returns a List (called "entries") of Entry objects. |
| // Each Entry object represents a single post in the XML feed. |
| // This section processes the entries list to combine each entry with HTML markup. |
| // Each entry is displayed in the UI as a link that optionally includes |
| // a text summary. |
| for (Entry entry : entries) { |
| htmlString.append("<p><a href='"); |
| htmlString.append(entry.link); |
| htmlString.append("'>" + entry.title + "</a></p>"); |
| // If the user set the preference to include summary text, |
| // adds it to the display. |
| if (pref) { |
| htmlString.append(entry.summary); |
| } |
| } |
| return htmlString.toString(); |
| } |
| |
| // Given a string representation of a URL, sets up a connection and gets |
| // an input stream. |
| private InputStream downloadUrl(String urlString) throws IOException { |
| URL url = new URL(urlString); |
| HttpURLConnection conn = (HttpURLConnection) url.openConnection(); |
| conn.setReadTimeout(10000 /* milliseconds */); |
| conn.setConnectTimeout(15000 /* milliseconds */); |
| conn.setRequestMethod("GET"); |
| conn.setDoInput(true); |
| // Starts the query |
| conn.connect(); |
| InputStream stream = conn.getInputStream(); |
| }</pre> |