Design and Implementation of a validating XML parser in Haskell: Master's thesis; University of Applied Sciences Wedel
Prev	Chapter 2. Package hdom	Next

2.5. Examples for filters and filter combinators

After discussing filters and filter combinators in depth, this section shows the use of these functions in real life examples. Because the filters work on the generic tree data type XmlTree, which represents whole XML documents, these functions form the basic uniform design of XML processing applications. The whole XML Parser of the Haskell XML Toolbox works internally with filters.

2.5.1. Removing comments

This example describes in depth the use of filters and combinators for removing comments from an XmlTree. It illustrates, how a complex filter can be constructed by combining very simple ones.

Removing comments from an XmlTree is done by transforming the tree with a special filter. This filter returns an empty list for comment nodes, all other nodes are simply returned. The result is a new tree without comment nodes.

To implement this filter, a predicate function is needed first, which detects comment nodes. Building on this filter a more complex one will be constructed that returns an empty list for comment nodes, but returns all other nodes. The final filter will apply this filter to the whole tree and create a new tree from its results.

The function isXCmt is a predicate filter that detects comment nodes. It takes a node and checks if the node is of type XCmt. If the node is a comment, a list with this node is returned, otherwise the result is an empty list.

isXCmt              :: XmlFilter
isXCmt              = isOfNode isXCmtNode

The predicate filter isXCmt itself bases on the basic selection filter isOfNode, which is exported by the module XmlTree, and the predicate function isXCmtNode. The selection filter isOfNode takes the predicate function as a parameter and returns a list with the passed node if the predicate function returns true for this node. Otherwise an empty list is returned. The predicate function isXCmtNode uses pattern matching to figure out if a node is of type XCmt.

isOfNode            :: (node -> Bool) -> TFilter node
isOfNode p t@(NTree n _)
    | p n       = [t]
    | otherwise	= []

isXCmtNode          :: XNode -> Bool
isXCmtNode (XCmt _) = True
isXCmtNode _        = False

The filter removeComment takes a node and returns an empty list if the node is a comment, otherwise a list with the passed node is returned. The filter removeComment combines the simple filter none and the above described filter isXCmt with the combinator when. If the predicate filter isXCmt is true, none returns an empty list. Otherwise a list with the node is returned.

removeComment		:: XmlFilter
removeComment
    = none `when` isXCmt

The main filter removeAllComment is a filter for removing all comments from an XmlTree. It uses the recursive transformation filter applyBottomUp, which applies the filter removeComment to the whole tree. The result is a new tree where all comments have been removed.

removeAllComment	:: XmlFilter
removeAllComment
    = applyBottomUp removeComment

This example shows very clearly how complex filters can be constructed out of simple ones. The functions isOfNode, none, when and applyBottomUp are already exported by the module XmlTree. They define own control structures and hide processing of the XmlTree from the programmer. The programmer does not have to worry about how to apply a filter to the whole tree and how to transform it. The use of predefined combinators and filters makes it possible to program on a very high abstraction level.

2.5.2. Merging internal and external DTD subset

After discussing the simple example for removing comments, this example will show a much more complex use of filters. Having defined combinators and basic filters already, it shows how they can usefully be combined into a complex function for merging the internal and external DTD subset. The XML parser of the Haskell XML Toolbox uses this function internally. The filter is part of the module DTDProcessing.

A validating XML parser must merge the internal and external DTD subset. The document type declaration can point to an external subset containing declarations, it can contain the declarations directly in an internal subset, or can do both. If both the external and internal subsets are used, the internal subset must occur before the external subset. This has the effect that entity and attribute-list declarations in the internal subset take precedence over those in the external subset [WWW01].

The function mergeDTDs takes two lists with the declarations of the internal and external subset as input and returns both subsets merged. Because the internal subset dominates over the external one, its declarations can be prepended before the filtered result list of the external subset. The operator ++ is a standard Haskell function for concatenating two lists. The expression mergeDTDentry dtdInt creates a complex filter function that filters declarations from the external subset, which have been already declared in the internal subset. This complex filter is applied by the combinator $$ to the whole declaration list of the external DTD subset.

mergeDTDs	:: XmlTrees -> XmlTrees -> XmlTrees
mergeDTDs dtdInt dtdExt
    = dtdInt ++ (mergeDTDentry dtdInt $$ dtdExt)

The complex filter mergeDTDentry is initialized with the internal subset. By applying this filter to each node of the external subset, it filters all entries that are already declared in the internal subset. It uses the same semantic like the example for removing comments. If a declaration occurs in both subsets, the filter returns an empty list, otherwise it returns a list with the element from the external subset. This behavior is defined by the expression: none `when` found.

Internally the filter mergeDTDentry bases on the filter filterDTDNode, which returns a node that occurs in both subsets. This filter is initialized with one node from the internal subset and checks if a node from the external subset equals this node. Because there usually exists more than one declaration in the internal subset, a list of this filter function is constructed, by applying filterDTDNode with the list transformation function map to all nodes of the internal subset. The result is a list of filter functions. This list of filters is later applied to a node of the external subset by the combinator cat.

mergeDTDentry	:: XmlTrees -> XmlFilter
mergeDTDentry dtd
    = none `when` found
      where
      filterList = map filterDTDNode dtd  -- construct the list of filters
      found      = cat filterList         -- concatenate the filters (set union)

The function filterDTDNode constructs a filter which returns a list with a node that occurs in both subsets. The filter is initialized with a node from the internal subset.

filterDTDNode	:: XmlTree -> XmlFilter

If the nodes from the internal and external subsets are both of type ELEMENT, NOTATION or ENTITY, the filter checks if the values of their attributes "name" are equal. If this is the case, a list with the node is returned, otherwise an empty list is returned.

filterDTDNode (NTree (XDTD dtdElem al) _)
    | dtdElem `elem` [ELEMENT, NOTATION, ENTITY]
        = filterElement
          where
          filterElement n@(NTree (XDTD dtdElem' al') _cl')
              | dtdElem == dtdElem' &&
                getAttrValue a_name al' == getAttrValue a_name al
                  = [n]
              | otherwise  = []
          filterElement _  = []

If the nodes from the internal and external subset are of type ATTLIST, the filter has to check if the values of the attributes for the element name and attribute name are equal. If this is the case, the declarations are equal and a list with this node is returned, otherwise an empty list is returned.

filterDTDNode (NTree (XDTD ATTLIST al) _)
    = filterAttlist
      where
      filterAttlist n@(NTree (XDTD ATTLIST al') _cl')
          | getAttrValue a_name  al' == getAttrValue a_name  al &&
            getAttrValue a_value al' == getAttrValue a_value al
              = [n]
      filterAttlist _ = []

For all other types a filter is constructed that returns always an empty list.

filterDTDNode _ =  none

This example might be a little bit confusing for people being not familiar with Haskell, but it demonstrates how even very complex tasks like merging the internal and external DTD subset can be implemented with filter functions and combinators. In this case a list of parameterized filter functions is constructed at runtime, which is applied to each node of the external DTD subset.