5. The DOM module¶
DOM is another standard associated with XML, in which the XML stream is represented as a tree in memory. This tree can be manipulated at will, to add new nodes, remove existing nodes, change attributes,…
Since it contains the whole XML information, it can then in turn be dumped to a stream.
The W3C committee (http://www.w3c.org) has defined several versions of the DOM, each building on the previous one and adding several enhancements.
XML/Ada currently supports the second revision of DOM (DOM 2.0), which mostly adds namespaces over the first revision. The third revision is not supported at this point, and it adds support for loading and saving XML streams in a standardized fashion.
Although it doesn’t support DOM 3.0, XML/Ada provides subprograms for doing similar things.
Only the Core module of the DOM standard is currently implemented, other modules will follow.
Note that the
encodings.ads file specifies the encoding to use to store
the tree in memory. Full compatibility with the XML standard requires that this
be UTF16, however, it is generally much more memory-efficient for European
languages to use UTF8. You can freely change this and recompile.
5.1. Using DOM¶
In XML/Ada, the DOM tree is built through a special implementation of a SAX parser, provided in the DOM.Readers package.
Using DOM to read an XML document is similar to using SAX: one must set up an input stream, then parse the document and get the tree. This is done with a code similar to the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
-- -- Copyright (C) 2017, AdaCore -- with Input_Sources.File; use Input_Sources.File; with Sax.Readers; use Sax.Readers; with DOM.Readers; use DOM.Readers; with DOM.Core; use DOM.Core; procedure DomExample is Input : File_Input; Reader : Tree_Reader; Doc : Document; begin Set_Public_Id (Input, "Preferences file"); Open ("pref.xml", Input); Set_Feature (Reader, Validation_Feature, False); Set_Feature (Reader, Namespace_Feature, False); Parse (Reader, Input); Close (Input); Doc := Get_Tree (Reader); Free (Reader); end DomExample;
This code is almost exactly the same as the code that was used when demonstrating the use of SAX (Using SAX).
The main two differences are:
We no longer need to define our own XML reader, and we simply use the one provided in DOM.Readers.
We therefore do not add our own callbacks to react to the XML events. Instead, the last instruction of the program gets a handle on the tree that was created in memory.
The tree can now be manipulated to get access to the value stored. If we want to implement the same thing we did for SAX, the code would look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
-- -- Copyright (C) 2017, AdaCore -- with Input_Sources.File; use Input_Sources.File; with Sax.Readers; use Sax.Readers; with DOM.Readers; use DOM.Readers; with DOM.Core; use DOM.Core; with DOM.Core.Documents; use DOM.Core.Documents; with DOM.Core.Nodes; use DOM.Core.Nodes; with DOM.Core.Attrs; use DOM.Core.Attrs; with Ada.Text_IO; use Ada.Text_IO; procedure DomExample2 is Input : File_Input; Reader : Tree_Reader; Doc : Document; List : Node_List; N : Node; A : Attr; begin Set_Public_Id (Input, "Preferences file"); Open ("pref.xml", Input); Set_Feature (Reader, Validation_Feature, False); Set_Feature (Reader, Namespace_Feature, False); Parse (Reader, Input); Close (Input); Doc := Get_Tree (Reader); List := Get_Elements_By_Tag_Name (Doc, "pref"); for Index in 1 .. Length (List) loop N := Item (List, Index - 1); A := Get_Named_Item (Attributes (N), "name"); Put_Line ("Value of """ & Value (A) & """ is " & Node_Value (First_Child (N))); end loop; Free (List); Free (Reader); end DomExample2;
The code is much simpler than with SAX, since most of the work is done internally by XML/Ada. In particular, for SAX we had to take into account the fact that the textual contents of a node could be reported in several events. For DOM, the tree is initially normalized, ie all text nodes are collapsed together when possible.
This added simplicity has one drawback, which is the amount of memory required to represent even a simple tree.
XML/Ada optimizes the memory necessary to represent a tree by sharing the
strings as much as possible (this is under control of constants at the
dom-core.ads). Still, DOM requires a significant amount of
information to be kept for each node.
For really big XML streams, it might prove impossible to keep the whole tree in memory, in which case ad hoc storage might be implemented through the use of a SAX parser. The implementation of dom-readers.adb will prove helpful in creating such a parser.
5.2. Editing DOM trees¶
Once in memory, DOM trees can be manipulated through subprograms provided by the DOM API.
Each of these subprograms is fully documented both in the Ada specs (the
*.ads files) and in the DOM standard itself, which XML/Ada follows
One important note however is related to the use of strings. Various subprograms allow you to set the textual content of a node, modify its attributes,…. Such subprograms take a Byte_Sequence as a parameter.
This Byte_Sequence must always be encoded in the encoding defined in the package Sax.Encoding (as described earlier, changing this package requires recompiling XML/Ada). By default, this is UTF-8.
Therefore, if you need to set an attribute to a string encoded for instance in iso-8859-15, you should use the subprogram Unicode.Encodings.Convert to convert it appropriately. The code would thus look as follows:
Set_Attribute (N, Convert ("å", From => Get_By_Name ("iso-8859-15")));
5.3. Printing DOM tress¶
The standard DOM 2.0 does not define a common way to read DOM trees from input sources, nor how to write them back to output sources. This was added in later revision of the standard (DOM 3.0), which is not yet supported by XML/Ada.
However, the package
DOM.Core.Nodes provides a Write
procedure that can be used for that purpose. It outputs a given DOM tree
to an Ada stream. This stream can then be connected to a standard file
on the disk, to a socket, or be used to transform the tree into a string
An example is provided in the XML/Ada distribution, called
dom/test/tostring.adb which shows how you can create a stream to
convert the tree in memory, without going through a file on the disk.
5.4. Adding information to the tree¶
The DOM standard does not mandate each node to have a pointer to the location it was read from (for instance file:line:column). In fact, storing that for each node would increase the size of the DOM tree (not small by any means already) significantly.
But depending on your application, this might be a useful information to have, for instance if you want to report error messages with a correct location.
Fortunately, this can be done relatively easily by extending the type DOM.Readers.Tree_Reader, and override the Start_Element. You would then add a custom attribute to all the nodes that contain the location for this node. Here is an example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
-- -- Copyright (C) 2017, AdaCore -- with DOM.Readers; use DOM.Readers; with Sax.Utils; use Sax.Utils; with Sax.Readers; use Sax.Readers; with Sax.Symbols; use Sax.Symbols; package DOM_With_Location is type Tree_Reader_With_Location is new Tree_Reader with null record; overriding procedure Start_Element (Handler : in out Tree_Reader_With_Location; NS : Sax.Utils.XML_NS; Local_Name : Sax.Symbols.Symbol; Atts : Sax.Readers.Sax_Attribute_List); end DOM_With_Location;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
-- -- Copyright (C) 2017, AdaCore -- with DOM.Core; use DOM.Core; with DOM.Core.Attrs; use DOM.Core.Attrs; with DOM.Core.Documents; use DOM.Core.Documents; with DOM.Core.Elements; use DOM.Core.Elements; with Sax.Locators; use Sax.Locators; package body DOM_With_Location is overriding procedure Start_Element (Handler : in out Tree_Reader_With_Location; NS : Sax.Utils.XML_NS; Local_Name : Sax.Symbols.Symbol; Atts : Sax_Attribute_List) is Att, Att2 : Attr; begin -- First create the node as usual Start_Element (Tree_Reader (Handler), NS, Local_Name, Atts); -- Then add the new attribute Att := Create_Attribute_NS (Get_Tree (Handler), Namespace_URI => "http://mydomain.com", Qualified_Name => "mydomain:location"); Set_Value (Att, To_String (Current_Location (Handler))); Att2 := Set_Attribute_Node (Handler.Current_Node, Att); end Start_Element; end DOM_With_Location;