DOM

XML offers us a powerful mechanism for defining the structure and content of documents. XML can be used to facilitate the same kind of scenario which gave rise to Electronic Document Interchange (EDI): B2B or even department-to-department communications. Using XML in conjunction with Document Type Definitions (DTDs) and a validating parser enables well-defined transactions to take place. So how do we actually use this technology?

I was investigating mechanisms for providing XML communications with various back-ends. I wanted to be able to provide a window into database views as well as Remedy ARS. I first toyed with the idea of creating dynamic DTDs based on the underlying data structure. This would require an interface between the data source and the servlet used for communication with clients. I envisioned a mechanism for publishing the DTD on the same web server which hosted the servlet.

Given the nature of the Remedy interface, this interface began to look quite complex. It was also becoming apparent that in some ways I would be trying to reinvent the wheel. Since JDBC already provides database meta-data, could it also be extended to support Remedy? Investigation of the complexity of extending the JDBC interface in order to communicate with Remedy indicated that such a project would take considerable time and effort.

Finally, Remedy has a peculiar data type which is not defined as a standard JDBC type; the enumeration. Since one of the applications for this mechanism involved the dynamic creation of forms, including pull-down menus for the enumeration type, another solution had to be found. The following diagram shows the general architecture of the process.

As you can see, this diagram still includes the existance of the DTD on the web server. Communication between the client and the servlet is planned as valid XML documents. The question marks denote the interface which would need to be implemented. How best to define this interface?

I decided to approach the problem from a slightly different direction. Was it possible to create a DTD which would be powerful and flexible enough such that the necessary information could be exchanged? Recalling my X.25 days, both layer 2 and layer 3 used a request/response architecture. Since there are enough differences between the two, it doesn't make sense to try to combine them into one. Therefore, we're going to end up with two different DTDs: one for the request, one for the response.

There is something that the request and the response share, however. The actual information contained within the body of the message consists of attribute/value pairs. Whether we're performing a query, modify, insert or delete, the relevant attributes have to be both sent and received. In the case of a query, we might have multiple response, but the format should not change across responses. This dictates a separate DTD for the content, which can then be included in the request and response documents.

As previously mentioned, the JDBC data type definitions should be adequate for access to a database but we have to accomodate the Remedy enumeration type as well. We also have to have some way of informing the client of the attributes associated with data source. These details combine to permit us to take a first stab at defining the data record DTD.

<?xml version="1.0"?>
<!ELEMENT record (field*)>
<!ELEMENT field ((enum+)|#PCDATA)>
<!ATTLIST field name CDATA #REQUIRED>
<!ATTLIST field type (int|small|dec|float|real|double|char|vchar|date|time|datetime|enum) #IMPLIED>
<!ATTLIST field width CDATA #IMPLIED>
<!ATTLIST field precision CDATA #IMPLIED>
<!ATTLIST field required (yes|no) #IMPLIED>
<!ATTLIST field notnull (yes|no) #IMPLIED>
<!ELEMENT enum #PCDATA>
<!ATTLIST enum value CDATA #IMPLIED>

Let's take a look at this DTD in more detail. The record element definition states that it must consist of zero or more field elements. The field element can contain either one or more enum elements or just data. We have a number of attributes associated with the field element. The field name is a required attribute and consists of character data. Note that the value cannot include quotes (single or double,) the ampersand, or the left and right angle brackets. This is not a great concern since most RDBMSs won't permit those characters in field names anyway.

The field type, width and precision attributes are sufficient to address all types recognized by JDBC as well as the enumeration required by Remedy. This definition contains just one quirk: the field type is not a required attribute. As we'll see later, there will be a time where it is required in order to make the model work. We can handle that situation programatically, however. Our overriding concern is to make our definition as reusable as practical.

Rounding out our field attributes are required and notnull. Anyone who has worked with RDBMSs has likely encountered these constraints, so they will not be discussed here. Again, these are not required. The enumeration definition is quite straight-forward, with the value attribute not required. There could be situations, for example, where we provide the enumeration text for a GUI application but need to use a numeric value when dealing with the database field.

This DTD for the record fields appears to suit our requirements quite well. It maps readily to a JDBC data source and accomodates the Remedy schemas as well. Since the Remedy API does not currently support Java, the complexity of the interface will be significant but not onerous. In fact, the bulk of the work has already been performed. Extending the existing code to operate as a Java Native Interface (JNI) should not require a great deal of effort.

The request DTD is very simple. The only elements we need to include in addition to the record are the command and the authentication elements. Since I envision the communication between client and server to take place over SSL, the authentication information will be encrypted in transit and so can be included in the XML document as "clear text."

This approach allows us to leverage existing mechanisms and eliminates the need to implement encryption algorithms. This also reduces the burden on the client, especially one using a different platform, language or operating system. The whole idea of XML is that the document is human as well as machine readable. Here is the request DTD:

<?xml version="1.0"?>
<!ENTITY % RecordDef SYSTEM "record.dtd">
<!ELEMENT request (username?,password?,record)>
<!ATTLIST request command (add|modify|delete|search|schema) #REQUIRED>
<!ELEMENT username (#PCDATA)>
<!ELEMENT password (#PCDATA)>
%RecordDef;

Although I mentioned that this was a fairly simple DTD, there are a few interesting elements. First, we create an external entity reference to a file named record.dtd which contains the record DTD we discussed previously. We include it explicitly in the %RecordDef; line. This brings in the definition for the record element.

In the request element definition, we specify that the username and password must occur either 0 or 1 times. This addresses those scenarios where authentication is not required and yet prevents multiple specification of these elements. Finally, 1 and only 1 record is always required in the request.

The only attribute for the request element is the command, which is an enumeration whose presence is required. So why didn't we include the username and password as attributes to the request element? You may recall from the previous discussion of the CDATA attribute type that there are certain characters which are not permitted in CDATA. These characters can be included in #PCDATA elements by escaping the special characters.

The response element is constructed similarly to the request. We don't require a command code but instead require a return code. Since database operations can generate one or more diagnostic messages, we have to provide for these as well. Finally, the result from a search can consist of more than one record. Here is the definition of the response:

<?xml version="1.0"?>
<!ENTITY % RecordDef SYSTEM "record.dtd">
<!ELEMENT response (returncode,(errormessage|record+)?)
<!ELEMENT returncode (#PCDATA)>
<!ELEMENT errormessage (#PCDATA)>
%RecordDef;

The response consists of a returncode element, zero or more errormessage elements, and zero or more record elements. The barest response then would consist of just the response and return code elements. Note that, as with the username and password attributes of the request, we've made the returncode an element rather than an attribute. Since the returncode element is of type #PCDATA we've given ourselves some flexibility for future implemtations.

Now that we know what format our documents should take, let's take a look at a couple of examples. The first phase of communication would typically be a schema request. Here is a sample schema request:

<?xml version="1.0"?>
<!DOCTYPE request SYSTEM "request.dtd">
<request command="schema">
<username>joe</username>
<password>blow</password>
<record/>
</request>

The only unusual construct in this request is the empty record element, indicated by the trailing / just before the tag end indicator (the right angle bracket.) This is just a convenient shortcut. The <tag/> is the equivalent of the <tag></tag> construct. The empty record is perfectly acceptable; remember, the DTD only requires that the record element exist. The DTD doesn't make any demands on the actual content of the record. A sample response from the server might look like this:

<?xml version="1.0"?>
<!DOCTYPE response SYSTEM "response.dtd">
<response>
<returncode>0</returncode>
<record>
<field name="firstname" type="char" width="16" required="yes"/>
<field name="lastname" type="char" width="16" required="yes"/>
<field name="age" type="int" required="yes"/>
<field name="sex" type="enum" required="yes">
<enum value="0">Male</enum>
<enum value="1">Female</enum>
</field>
<field name="employeeid" type="char" width="16"/>
</record>
</response>

We use the same notation for the field elements as the record element of the request. We aren't returning any actual field content, merely the definitions of the fields and their attributes. In fact, the only field element which has content is the enumerated field. Note how conveniently this information could be utilized by a GUI application. We know that we have three text fields and one menu, and we know the field lengths and the menu options. The required attribute can be utilized by a Java applet or even JavaScript to perform field validation before sending an XML document to the server.

Given the above response, let's create a document which adds a new employee. We assume that we've got a reasonably intelligent client which abides by the required attribute and sends a document like this:

<?xml version="1.0"?>
<!DOCTYPE request SYSTEM "request.dtd">
<request command="add">
<username>joe</username>
<password>blow</password>
<record>
<field name="firstname">Tim</field>
<field name="lastname">Smith</field>
<field name="age">36</field>
<field name="sex">0</field>
</record>
</request>

Just to more fully demonstrate the use of the fields, let's assume for a moment that the server back-end already has a record for Tim Smith. It might send back a response somewhat like the following:

<?xml version="1.0"?>
<!DOCTYPE response SYSTEM "response.dtd">
<response>
<returncode>-1</returncode>
<errormessage>Duplicate record found for user</errormessage>
</response>

There are a couple of things to notice here. We have included the optional element errormessage, and we don't have to include a record element (even an empty one) since it's not required by the DTD. If the add operation completed successfully, we might receive a response like this:

<?xml version="1.0"?>
<!DOCTYPE response SYSTEM "response.dtd">
<response>
<returncode>0</returncode>
<record>
<field name="employeeid">3984</field>
</record>
</response>

In this example, the server responds with a returncode of 0 and the new employeeid value. Again, this is just a hypothetical example to demonstrate the model I've discussed. Now that we know what the requests and replies look like, how do we go about creating them and reading them?

There are two basic APIs for interacting with XML documents from the Java language: Simple API for XML (SAX) and the Document Object Model. I've chosen to use DOM since the object orientation seems more in keeping with the Java philosophy. DOM also appears to be the API experiencing the most development effort, with the Beta version of DOM level 2 now available.

It should be noted that while DOM is appropriate in many situations, it makes considerable demands on the memory of the Java virtual machine. If large result sets from database lookups are anticipated, for example, then the SAX approach might be more appropriate. Click here for some code using the SAX parser.

Using the document format discussed above, let's take a look at the code which could be used to create a simple document response with a "0" return code.

 1 import  org.apache.xerces.parsers.*;
 2 import  org.apache.xerces.utils.*;
 3 import  org.apache.xerces.framework.*;
 4 import  org.apache.xalan.xpath.xml.TreeWalker;
 5 import  org.apache.xalan.xpath.xml.FormatterToXML;
 6 import  org.xml.sax.*;
 7 import  org.w3c.dom.*;
 8 
 9     public int sendXMLresponse( Writer out, String rc ) {
10         DOMParser   parser = null;
11         Document    doc = null;
12         Element     request = null;
13         Element     elem = null;
14 
15         try {
16             /*
17              * create document
18              */
19 
20             parser = new DOMParser();
21             parser.startDocument();
22             doc = parser.getDocument();
23             request = doc.createElement( "response" );
24             request.appendChild( doc.createTextNode( "\n" ) );
25             doc.appendChild( request );
26             elem = doc.createElement( "returncode" );
27             elem.appendChild( doc.createTextNode( rc ) );
28             request.appendChild( elem );
29             request.appendChild( doc.createTextNode( "\n" ) );
30             parser.endDocument();
31 
32             /*
33              * export to Writer
34              */
35 
36             FormatterToXML fl = new FormatterToXML( out );
37             TreeWalker tw = new TreeWalker( fl );
38             tw.traverse( doc );
39         }
40         catch( Exception e ) {
41             e.printStackTrace();
42             return( -1 );
43         }
44         return( 0 );
45     }

I've numbered the lines in the file to make it easier to discuss some of the interesting elements included. Firstly, note the number of imports we have to use. I'm using the Apache Xalan package, which includes Xerces. The packages you will need to import will, of course, depend on the toolset you choose.

On line 20 we create a new DOM parser. At this point the parser can be considered a blank slate, so the first think we have to do is start the document (line 21.) Now we can get a handle to the document root (line 22.) The document root will only accept a single element as the root element; trying to add more will throw an exception. We create the root element on line 23 and add it to the document, as the root, on line 25.

So why are we adding a text node, consisting of just a newline, on line 24? You'll note that we also do this on line 29. The nature of the DOM API is such that all elements will be nested properly but with absolutely no formatting. Without the text elements, the output will look like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<response><returncode>0</returncode></response>

However, with the text elements included, we get the following:

<?xml version="1.0" encoding="ISO-8859-1"?>
<response>
<returncode>0</returncode>
</response>

On lines 26-28 we create a new text element (must be parsed; remember the DTD) and append it to the returncode element. On line 30 we close the document and on lines 36-38 we generate the output text document. Although there is a class called XMLSerializer, I have not been able to get it to generate text files in the desired format. Instead, we use the TreeWalker facility to generate the output to a Writer object.

DOM provides a hierarchical view of the document. The document is represented as a series of nodes, each of which can have child nodes associated with it. The root document is the parent of the root node, which is the ultimate parent of all child nodes. Using the example above, we end up with a structure like this:

Now that we know how to create a simple response, let's take a look at how we would generate a response to a query which returns multiple record elements. It's actually just an extension of what we've already seen. Just keep in mind that we're creating a heirarchy and that children have to be associated with the approriate parent elements.

 1 public class sendXML {
 2     public static int sendXMLresponse( Writer out, String rc, String error,
 3       KeyValue kv[] ) {
 4         DOMParser   parser = null;
 5         Document    doc = null;
 6         NodeList    nodes = null;
 7         Element     response = null;
 8         Element     record = null;
 9         Element     elem = null;
10         String      key = null;
11         String      value = null;
12 
13         try {
14             /*
15              * create document
16              */
17 
18             parser = new DOMParser();
19             parser.startDocument();
20             doc = parser.getDocument();
21             response = doc.createElement( "response" );
22             response.appendChild( doc.createTextNode( "\n" ) );
23             doc.appendChild( response );
24             elem = doc.createElement( "returncode" );
25             elem.appendChild( doc.createTextNode( rc ) );
26             response.appendChild( elem );
27             response.appendChild( doc.createTextNode( "\n" ) );
28             if( error != null ) {
29                 elem = doc.createElement( "errormessage" );
30                 elem.appendChild( doc.createTextNode( error ) );
31                 response.appendChild( doc.createTextNode( "\n" ) );
32                 response.appendChild( elem );
33             }
34 
35             /*
36              * loop through all array elements
37              */
38 
39             for( int i = 0; i < kv.length; i++ ) {
40 
41                 /*
42                  * create new record element
43                  */
44 
45                 record = doc.createElement( "record" );
46                 record.appendChild( doc.createTextNode( "\n" ) );
47 
48                 /*
49                  * loop through all key/value pairs
50                  */
51 
52                 Enumeration e = kv[i].getKeys();
53                 while( e.hasMoreElements() ) {
54                     key = (String) e.nextElement();
55                     value = kv[i].getValue( key );
56                     elem = doc.createElement( "field" );
57                     elem.setAttribute( "name", key );
58                     elem.appendChild( doc.createTextNode( value ) );
59                     record.appendChild( elem );
60                     record.appendChild( doc.createTextNode( "\n" ) );
61                 }
62 
63                 /*
64                  * add record to response
65                  */
66 
67                 response.appendChild( record );
68                 response.appendChild( doc.createTextNode( "\n" ) );
69             }
70             parser.endDocument();
71 
72             /*
73              * create output document
74              */
75 
76             FormatterToXML fl = new FormatterToXML( out );
77             TreeWalker tw = new TreeWalker( fl );
78             tw.traverse( doc );
79         }
80         catch( Exception e ) {
81             System.err.println( e.toString() );
82             e.printStackTrace();
83             return( -1 );
84         }
85         return( 0 );
86     }

The KeyValue class implementation is not particularly important; what is relevant are the enhancements to the previous example. On lines 28-33 we create an errormessage element only if the String argument provided is non-null. It's appended to the response in the same way as the (required) returncode element.

On lines 45-46 we create a record element for each KeyValue object in the array argument. We add the record to the result in line 67. Similarly, we create a new field element for each key/value pair on line 56. We introduce the use of element attributes on line 57. We can add as many attributes to an element as we need. Finally, we add the field element to the record element in line 59.

With the appropriate arguments, a sample output XML document is shown here:

<?xml version="1.0" encoding="ISO-8859-1"?>
<response>
<returncode>0</returncode>
<record>
<field name="firstname">Fred</field>
<field name="lastname">Scuttle</field>
<field name="age">54</field>
<field name="sex">M</field>
<field name="employeeid">912</field>
</record>
<record>
<field name="firstname">Tim</field>
<field name="lastname">Smith</field>
<field name="age">36</field>
<field name="sex">M</field>
<field name="employeeid">3984</field>
</record>
</response>

Now that we've created the document, it's time to look at how to extract the information at the receiving end. Here's some sample code for parsing a response XML document:

 1     public static KeyValue[] receiveXMLresponse( Reader in, StringBuffer rc,
 2       StringBuffer errorString ) {
 3         DOMParser   parser = null;
 4         Document    doc = null;
 5         NodeList    nodes = null;
 6         Node        record = null;
 7         NodeList    children = null;
 8         Node        child = null;
 9         NodeList    kids = null;
10         Node        kid = null;
11         Element     elem = null;
12         NamedNodeMap    attributes = null;
13         KeyValue    result[] = null;
14         KeyValue    kv = null;
15         Vector      results = new Vector();
16         String      key = null;
17         String      value = null;
18 
19         parser = new DOMParser();
20         try {
21             parser.parse( new InputSource( in ) );
22         }
23         catch( Exception e ) {
24             e.printStackTrace();
25             return( null );
26         }
27 
28         doc = parser.getDocument();
29         nodes = doc.getElementsByTagName( "returncode" );
30         rc.setLength( 0 );
31         children = nodes.item( 0 ).getChildNodes();
32         rc.append( children.item( 0 ).getNodeValue() );
33         nodes = doc.getElementsByTagName( "errormessage" );
34         if( nodes.getLength() > 0 ) {
35             children = nodes.item( 0 ).getChildNodes();
36             errorString.append( children.item( 0 ).getNodeValue() );
37         }
38         nodes = doc.getElementsByTagName( "record" );
39         for( int i = 0; i < nodes.getLength(); i++ ) {
40             kv = new KeyValue();
41             record = nodes.item( i );
42             children = record.getChildNodes();
43             for( int j = 0; j < children.getLength(); j++ ) {
44                 child = children.item( j );
45                 if( child.getNodeType() != org.w3c.dom.Node.ELEMENT_NODE )
46                     continue;
47                 key = null;
48                 value = null;
49                 elem = (Element) child;
50                 attributes = elem.getAttributes();
51                 for( int k = 0; k < attributes.getLength(); k++ ) {
52                     Node attr = attributes.item( k );
53                     if( attr.getNodeName().equals( "name" ) )
54                         key = attr.getNodeValue();
55                 }
56                 kids = elem.getChildNodes();
57                 value = kids.item( 0 ).getNodeValue();
58                 kv.addKey( key, value );
59             }
60             results.addElement( kv );
61         }
62         result = new KeyValue[results.size()];
63         for( int i = 0; i < results.size(); i++ )
64             result[i] = (KeyValue) results.elementAt( i );
65         return( result );
66     }

Again, the details of the KeyValue structure are not as important as the way we navigate the document tree. For convenience, I use a Vector for temporary storage of KeyValue objects before allocating and populating a new array in lines 62-64. We return an object reference to the new array in line 65. Also, the imports have been excluded from this example.

The basic mechanism for obtaining references to elements is shown in lines 29, 32 and 38, namely the getElementsByTagName method of the Document class. We obtain an object reference to each Node element in turn on line 41. On line 42, we obtain a reference to a NodeList object containing the children of the Node.

We loop through the children and select one child at a time on line 44. Lines 45-46 are vitally important since we only want to deal with element node children. When we were creating the document, we added text elements (newlines) in order to make the XML file more human-readable. These are also children of the record element, so we need to ignore them (they will appear as elements of type org.w3c.dom.Node.TEXT_MODE, with a name of "#text") when parsing.

In lines 50-55, we loop through the attributes associated with the field element, looking for the "name" attribute. We extract the text element associated with the field element (the actual field data) in lines 56-57. Once we have the field name and value, we add them to our KeyValue object (line 58) and eventually add the fully populated object to the Vector (line 60.)

I should mention that this is a very minimal example; there is no error checking code, for example. Pragmatic programmers will always perform the appropriate tests (null pointers, array lengths, etc.) Note also how this code is written with full knowledge of the DTD used to create the XML document. This is a side-effect of the approach taken in this exercise. By working from the DTD out, on both client and server platforms, we have a mechanism which addresses the original requirements.

This example also serves to demonstrate how easy it is to exchange XML documents between client and server once the rules have been defined. Although the example demonstrates the use of the Java language and Apache Xalan/Xerces, we are by no means limited to this environment. Click here for a sample perl client and here for a sample C client. Any programming language which is capable of reading and writing text files, and communicating with a web server, can successfully exchange data with our hypothetical server.

NOTE:

I've used the Reader and Writer classes for the example code included herein. I've used generic classes in order to provide the greatest amount of flexibility. The following lines demonstrate how I can readily obtain a Writer from a File, HttpServletResponse or even a URLConnection.

File file;
Writer w = new FileWriter( file );

HttpServletResponse resp;
Writer w = new OutputStreamWriter( resp.getOutputStream() );

URLConnection conn;
Writer w = new OutputStreamWriter( conn.getOutputStream() );