XML and Java
This page will walk you through the basics of XML with Java
First we'll look at simple parsing of XML, then we'll look at saving and loading objects as XML.
In Java you have two major choices for reading and writing XML:
- Document Object Model (DOM) / Tree-based Parsing: The whole document is read in and processed into a tree-structure that you can then navigate around. The whole document is loaded into memory.
- Stream based Parsing: The document is read in one element at a time, and you are given the attributes of each element. The document is not stored in memory.
In addition, Stream-based Parsing is divided into:
- Push-Parsing / Event-based Parsing: The whole stream is read and as an element appears in a stream, a relevant method is called. The programmer has no control on the in-streaming.
- Pull-Parsing: The programmer asks for the next element in the XML and can then farm it off for processing. The programmer has complete control over the rate of movement through the XML.
Thus, Java has three broad APIs that match these divisions.
DOM (Document Object Model): javax.xml.parsers
A tree-based parsing API: You get a parser and set it up with an InputStream. Once it has read the XML you can get it as a Document. Once you have a Document, it is possible with methods like getElement and createElement to read and write to the XML stored in the program. The key class is DocumentBuilder. This is gained from a DocumentBuilderFactory which has various methods to set up the parser, including setValidating, if you wish to ensure the XML is well formed. (For writing DOM data to an actual XML file, see TrAX, below).
SAX (Simple API for XML): javax.xml.parsers
A stream and push/event-based parsing API: You build a handler that implements a set of interfaces, and register the handler with a parser (connecting the parser to an InputStream at the same time). When the parser hits an element it calls the relevant method. Key classes are SAXParser and DefaultHandler The former is gained from a SAXParserFactory which has various methods to set up the parser, including setValidating, if you wish to ensure the XML is well formed. (For writing SAX data to an XML file, see TrAX, below).
StAX (Streaming API for XML): javax.xml.stream
A stream-based pull-parsing API: You ask a parser for each new element, and then request its attributes. The key classes are XMLStreamReader XMLStreamWriter Though there are also slightly more event-based versions as well (details). The parsers are gained from a XMLInputFactory while the writer is gained from a XMLOutputFactory (Details and examples).
Other Java XML stuff:
- JAXP (Java API for XML Processing): Overarching API name for most of the above.
- TrAX (Transformation API For XML):
javax.xml.transform
API for transforming between XML flavours using XSLT (Example code). TrAX is important even if you aren't interested in transforming XML, as it offers the option for transforming SAX and DOM objects to streams for writing/serializing to text files. The key classes are the different implementations of Source along with StreamResult used with a Transformer (Examples).
Marshalling
Marshalling is the saving of java Objects as XML in a text file for later unmarshalling back to working Java objects. This is a bit like serialisation (the saving of objects to binary files) but somewhat more constrained.
JAXB (Java Architecture for XML Binding: javax.xml.bind):
Used for generating classes from XML schema, and marshalling objects of these classes to XML files / unmarshalling these back into objects again. The key thing to remember is that objects can only be saved if their data can be mapped to the core XML element types.
The key stages are:
1) Write an XML Schema representing the classes you want java to work with. This has to be just as java likes it, but there's actually very little information on what that form is. The examples on these webpages work. Other well-formed XML won't necessarily. You might want to start by adapting the XML here.
2) Use this schema, plus a java tool "xjc" to generate java files representing the schema. The file will be for a class to represent the root element, but contain classes for the sub-elements as well. Sub-element class objects will be stored in the root element object (or other sub-elements) in a List. Also produced will be an ObjectFactory which can be used to create specific objects from the classes if the objects aren't going to be read in from XML (for example, if you want to just write data out). Compile all these files.
3) Use the ObjectFactory in your own java to generate objects from the classes and then use their set/get methods to fill them with data, OR read in pre-existing objects stored in XML files using an unmarshaller - at which point you've created objects read in from XML.
4) If you want to write the objects out as java (for example, having changed the data), write the root-element object (plus any sub-element objects it contains) to an XML file using a marshaller.
Example: As StAX is relatively new, and fully-working JAXB examples that work in the way JDK1.6 suggests are relatively rare, here's an example application containing a fully worked example of both, which uses the XML found in these webpages : XMLExamples.java. It allows you to read objects in from XML and write objects out to XML. Run the code as described in the Docs, and look at how it works. Use these files: test.xml and test.xsd copied to the same directory as the java file.
Good sources
Processing XML with Java: things have moved on since it was written, however, other than the fact that many of the libraries are now in the javax packages, the examples still hold true.
XML and Java for Scientists/Engineers: Growing resource for Scientific XML and Java. Includes SVG data presentation.
The Java Web Services Tutorial: Sun's tutorial covering the newer elements of the XML processing packages.
For information on JAXB specifically, your best bet is the javax.xml.bind API, as most other examples are out of date. However, you can gleen alot from: Details and examples; Simple example; Simple example.
Also worth checking out is the Unofficial JAXB Guide which, for example, explains about dealing with cyclic references between classes.