Considering the following document :
<?xml version='1.0' encoding='UTF-8' ?>
<library>
<book id='1'>Effective Java</book>
<book id='2'>Java Concurrency In Practice</book>
<notABook id='3'>This is not a book element</notABook>
</library>
One can use the following code to parse it and build a map of book titles by book id.
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import java.io.StringReader;
import java.util.HashMap;
import java.util.Map;
public class StaxDemo {
public static void main(String[] args) throws Exception {
String xmlDocument = "<?xml version='1.0' encoding='UTF-8' ?>"
+ "<library>"
+ "<book id='1'>Effective Java</book>"
+ "<book id='2'>Java Concurrency In Practice</book>"
+ "<notABook id='3'>This is not a book element </notABook>"
+ "</library>";
XMLInputFactory xmlInputFactory = XMLInputFactory.newFactory();
// Various flavors are possible, e.g. from an InputStream, a Source, ...
XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(new StringReader(xmlDocument));
Map<Integer, String> bookTitlesById = new HashMap<>();
// We go through each event using a loop
while (xmlStreamReader.hasNext()) {
switch (xmlStreamReader.getEventType()) {
case XMLStreamConstants.START_ELEMENT:
System.out.println("Found start of element: " + xmlStreamReader.getLocalName());
// Check if we are at the start of a <book> element
if ("book".equals(xmlStreamReader.getLocalName())) {
int bookId = Integer.parseInt(xmlStreamReader.getAttributeValue("", "id"));
String bookTitle = xmlStreamReader.getElementText();
bookTitlesById.put(bookId, bookTitle);
}
break;
// A bunch of other things are possible : comments, processing instructions, Whitespace...
default:
break;
}
xmlStreamReader.next();
}
System.out.println(bookTitlesById);
}
This outputs :
Found start of element: library
Found start of element: book
Found start of element: book
Found start of element: notABook
{1=Effective Java, 2=Java Concurrency In Practice}
In this sample, one must be carreful of a few things :
THe use of xmlStreamReader.getAttributeValue
works because we have checked first that the parser is in the START_ELEMENT
state. In evey other states (except ATTRIBUTES
), the parser is mandated to throw IllegalStateException
, because attributes can only appear at the beginning of elements.
same goes for xmlStreamReader.getTextContent()
, it works because we are at a START_ELEMENT
and we know in this document that the <book>
element has no non-text child nodes.
For more complex documents parsing (deeper, nested elements, ...), it is a good practice to "delegate" the parser to sub-methods or other objets, e.g. have a BookParser
class or method, and have it deal with every element from the START_ELEMENT to the END_ELEMENT of the book XML tag.
One can also use a Stack
object to keep around important datas up and down the tree.