Design and Implementation of a validating XML parser in Haskell: Master's thesis; University of Applied Sciences Wedel | ||
---|---|---|
Prev | Next |
This chapter describes the package hdom, which defines a generic tree data type XmlTree for representing whole XML documents in Haskell. The package provides many functions for processing XML documents represented by this data model.
The core public module of the package hdom is the module XmlTree. It exports all elements from the basic libraries XmlTreeAccess, XmlTreePredicates, XmlTreeTypes, NTree and XmlKeywords.
The module XmlTreeTypes defines data types for representing any XML document in Haskell. The generic tree data type XmlTree models XML documents, including all logical units of XML like the DTD subset or document subset. This type is based on the general n-ary tree NTree defined in the module NTree. The type NTree defines trees as a node that has a list of child nodes. Leafs are just nodes with an empty child list. XML documents are composed of elements, comments, DTD declarations, character references or processing instructions. XmlTree provides for each logical unit an own node type.
The module NTree defines a general n-ary tree structure NTree as well as filter functions (see Section 2.3) and combinators (see Section 2.4) for processing this data type. The filters and combinators have been copied and modified from HaXml [WWW21]. In contrast to HaXml the filters have been modified using a more generic approach. The filter functions of HaXml work only for the document subset of XML documents. Because of the generic tree data model NTree these functions can be used to process whole XML documents in the Haskell XML Toolbox.
The module XmlTreeAccess provides basic filter functions for constructing, editing or selecting parts of the data type XmlTree. Furthermore it provides functions for processing the attribute list of XmlTree nodes.
The module XmlTreePredicates provides basic predicate filter functions. The functions are similar to predicate logic. If the condition is true, they return a list with a node, otherwise they return an empty list.
The module XmlKeywords provides constants that are used for representing DTD keywords and attributes in the XmlTree.
Besides the module XmlTree and its sub-modules there exist some other public modules for processing the XmlTree data structure.
XmlTreeToString provides functions for transformation of an XmlTree back into a string. The string shows an XML document in XML syntax.
EditFilters provides some filters for transforming an XmlTree:
Remove text nodes which contain only white space from the tree.
Remove comment nodes from the tree.
Transform special characters like <, >, ", ' and & of text nodes into character references.
Manipulate text nodes with customized functions.
Convert CDATA nodes to text nodes by escaping all special characters.
Convert character references to normal text.
Create a canonicalized representation of XML documents [WWW03].
FormatXmlTree creates a string with a tree representation of an XmlTree. This is useful to see how the data type XmlTree represents an XML document.
XmlFilterCombinators provides special filter combinators for processing an XmlTree. These filters can only operate on the data type XmlTree and not on the general n-ary tree data type NTree.