5. The DOM module

DOM is another standard associated with XML, in which the XML stream is represented as a tree in memory. This tree can be manipulated at will, to add new nodes, remove existing nodes, change attributes,…

Since it contains the whole XML information, it can then in turn be dumped to a stream.

As an example, most modern web browsers provide a DOM interface to the document currently loaded in the browser. Using javascript, one can thus modify dynamically the document. The calls to do so are similar to the ones provided by XML/Ada for manipulating a DOM tree, and all are defined in the DOM standard.

The W3C committee (http://www.w3c.org) has defined several versions of the DOM, each building on the previous one and adding several enhancements.

XML/Ada currently supports the second revision of DOM (DOM 2.0), which mostly adds namespaces over the first revision. The third revision is not supported at this point, and it adds support for loading and saving XML streams in a standardized fashion.

Although it doesn’t support DOM 3.0, XML/Ada provides subprograms for doing similar things.

Only the Core module of the DOM standard is currently implemented, other modules will follow.

Note that the encodings.ads file specifies the encoding to use to store the tree in memory. Full compatibility with the XML standard requires that this be UTF16, however, it is generally much more memory-efficient for European languages to use UTF8. You can freely change this and recompile.

5.1. Using DOM

In XML/Ada, the DOM tree is built through a special implementation of a SAX parser, provided in the DOM.Readers package.

Using DOM to read an XML document is similar to using SAX: one must set up an input stream, then parse the document and get the tree. This is done with a code similar to the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
--
--  Copyright (C) 2017, AdaCore
--

with Input_Sources.File; use Input_Sources.File;
with Sax.Readers;        use Sax.Readers;
with DOM.Readers;        use DOM.Readers;
with DOM.Core;           use DOM.Core;

procedure DomExample is
   Input  : File_Input;
   Reader : Tree_Reader;
   Doc    : Document;
begin
   Set_Public_Id (Input, "Preferences file");
   Open ("pref.xml", Input);

   Set_Feature (Reader, Validation_Feature, False);
   Set_Feature (Reader, Namespace_Feature, False);

   Parse (Reader, Input);
   Close (Input);

   Doc := Get_Tree (Reader); 

   Free (Reader);
end DomExample;

This code is almost exactly the same as the code that was used when demonstrating the use of SAX (Using SAX).

The main two differences are:

  • We no longer need to define our own XML reader, and we simply use the one provided in DOM.Readers.

  • We therefore do not add our own callbacks to react to the XML events. Instead, the last instruction of the program gets a handle on the tree that was created in memory.

The tree can now be manipulated to get access to the value stored. If we want to implement the same thing we did for SAX, the code would look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
--
--  Copyright (C) 2017, AdaCore
--

with Input_Sources.File; use Input_Sources.File;
with Sax.Readers;        use Sax.Readers;
with DOM.Readers;        use DOM.Readers;
with DOM.Core;           use DOM.Core;
with DOM.Core.Documents; use DOM.Core.Documents;
with DOM.Core.Nodes;     use DOM.Core.Nodes;
with DOM.Core.Attrs;     use DOM.Core.Attrs;
with Ada.Text_IO;        use Ada.Text_IO;

procedure DomExample2 is
   Input  : File_Input;
   Reader : Tree_Reader;
   Doc    : Document;
   List   : Node_List;
   N      : Node;
   A      : Attr;
begin
   Set_Public_Id (Input, "Preferences file");
   Open ("pref.xml", Input);

   Set_Feature (Reader, Validation_Feature, False);
   Set_Feature (Reader, Namespace_Feature, False);

   Parse (Reader, Input);
   Close (Input);

   Doc := Get_Tree (Reader); 

   List := Get_Elements_By_Tag_Name (Doc, "pref");

   for Index in 1 .. Length (List) loop
       N := Item (List, Index - 1);
       A := Get_Named_Item (Attributes (N), "name");
       Put_Line ("Value of """ & Value (A) & """ is "
                 & Node_Value (First_Child (N)));
   end loop; 

   Free (List);

   Free (Reader);
end DomExample2;

The code is much simpler than with SAX, since most of the work is done internally by XML/Ada. In particular, for SAX we had to take into account the fact that the textual contents of a node could be reported in several events. For DOM, the tree is initially normalized, ie all text nodes are collapsed together when possible.

This added simplicity has one drawback, which is the amount of memory required to represent even a simple tree.

XML/Ada optimizes the memory necessary to represent a tree by sharing the strings as much as possible (this is under control of constants at the beginning of dom-core.ads). Still, DOM requires a significant amount of information to be kept for each node.

For really big XML streams, it might prove impossible to keep the whole tree in memory, in which case ad hoc storage might be implemented through the use of a SAX parser. The implementation of dom-readers.adb will prove helpful in creating such a parser.

5.2. Editing DOM trees

Once in memory, DOM trees can be manipulated through subprograms provided by the DOM API.

Each of these subprograms is fully documented both in the Ada specs (the *.ads files) and in the DOM standard itself, which XML/Ada follows fully.

One important note however is related to the use of strings. Various subprograms allow you to set the textual content of a node, modify its attributes,…. Such subprograms take a Byte_Sequence as a parameter.

This Byte_Sequence must always be encoded in the encoding defined in the package Sax.Encoding (as described earlier, changing this package requires recompiling XML/Ada). By default, this is UTF-8.

Therefore, if you need to set an attribute to a string encoded for instance in iso-8859-15, you should use the subprogram Unicode.Encodings.Convert to convert it appropriately. The code would thus look as follows:

Set_Attribute (N, Convert ("å", From => Get_By_Name ("iso-8859-15")));

5.3. Printing DOM tress

The standard DOM 2.0 does not define a common way to read DOM trees from input sources, nor how to write them back to output sources. This was added in later revision of the standard (DOM 3.0), which is not yet supported by XML/Ada.

However, the package DOM.Core.Nodes provides a Write procedure that can be used for that purpose. It outputs a given DOM tree to an Ada stream. This stream can then be connected to a standard file on the disk, to a socket, or be used to transform the tree into a string in memory.

An example is provided in the XML/Ada distribution, called dom/test/tostring.adb which shows how you can create a stream to convert the tree in memory, without going through a file on the disk.

5.4. Adding information to the tree

The DOM standard does not mandate each node to have a pointer to the location it was read from (for instance file:line:column). In fact, storing that for each node would increase the size of the DOM tree (not small by any means already) significantly.

But depending on your application, this might be a useful information to have, for instance if you want to report error messages with a correct location.

Fortunately, this can be done relatively easily by extending the type DOM.Readers.Tree_Reader, and override the Start_Element. You would then add a custom attribute to all the nodes that contain the location for this node. Here is an example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
--
--  Copyright (C) 2017, AdaCore
--

with DOM.Readers;       use DOM.Readers;
with Sax.Utils;         use Sax.Utils;
with Sax.Readers;       use Sax.Readers;
with Sax.Symbols;       use Sax.Symbols;

package DOM_With_Location is

   type Tree_Reader_With_Location is new Tree_Reader with null record;
   overriding procedure Start_Element
      (Handler     : in out Tree_Reader_With_Location;
       NS          : Sax.Utils.XML_NS;
       Local_Name  : Sax.Symbols.Symbol;
       Atts        : Sax.Readers.Sax_Attribute_List);

end DOM_With_Location;
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
--
--  Copyright (C) 2017, AdaCore
--

with DOM.Core;            use DOM.Core;
with DOM.Core.Attrs;      use DOM.Core.Attrs;
with DOM.Core.Documents;  use DOM.Core.Documents;
with DOM.Core.Elements;   use DOM.Core.Elements;
with Sax.Locators;        use Sax.Locators;

package body DOM_With_Location is

   overriding procedure Start_Element
      (Handler     : in out Tree_Reader_With_Location;
       NS          : Sax.Utils.XML_NS;
       Local_Name  : Sax.Symbols.Symbol;
       Atts        : Sax_Attribute_List)
   is
      Att, Att2 : Attr;
   begin
      --  First create the node as usual
      Start_Element (Tree_Reader (Handler), NS, Local_Name, Atts);

      --  Then add the new attribute
      Att := Create_Attribute_NS
         (Get_Tree (Handler),
          Namespace_URI  => "http://mydomain.com",
          Qualified_Name => "mydomain:location");
      Set_Value (Att, To_String (Current_Location (Handler)));

      Att2 := Set_Attribute_Node (Handler.Current_Node, Att);
   end Start_Element;

end DOM_With_Location;