Design and Implementation of a validating XML parser in Haskell: Master's thesis; University of Applied Sciences Wedel | ||
---|---|---|

Prev | Chapter 2. Package hdom | Next |

Functional programming languages like Haskell allow the use of higher-order functions, which take some functions as input, return a function as a result, or both. Filter combinators are higher-order functions, or operators, for combining several filters (see Section 2.3). By combining simple filters, more complex functions can be created. Because all filters share the same type, it is possible that any filter can be composed with any other. All filters can be mixed and matched freely, because all functions in Haskell are side-effect free.

The idea of filter combinators was adapted and extended from HaXml [WWW21]. In their paper "Haskell and XML: Generic Combinators or Type-Based Translation?" [WWW22] Malcom Wallace and Colin Runciman describe various algebraic laws for their combinators.

The use of combinators makes it possible to hide details of programming over data structures. All details of data structure manipulation are hidden in combinator functions. In effect, the combinators define problem specific control structures. With this approach it is possible to reach a form of expression that is natural for the problem itself.

Conceptually, combinators are much like Unix pipes in that they allow building up more complex computational sequences by flexibly arranging highly specialized tools [WWW24].
The equivalent to Unix pipes it the "Irish composition" combinator, represented by the infix operator "`o`". This combinator applies one filter to the results of another one. This is similar to a pipe, which passes the output of one program as the input to another one.

The combinator library provides all functions that are necessary for traversing and processing XML documents represented as an `XmlTree`. Haskell allows the definition of own infix operator symbols. Some combinators are defined as infix operators where it seemed more natural.

**Basic filter combinators**

`o :: (a -> [b]) -> (c -> [a]) -> c -> [b]`Irish composition. Sequential composition of two filters. The left filter is applied to the result of the right filter. (conjunction)

`(+++) :: (a -> [b]) -> (a -> [b]) -> (a -> [b])`Binary parallel composition, the function unifies the results of two filters sequential. Each filter uses a copy of state. (union)

`cat :: [a -> [b]] -> (a -> [b])`Concatenates the results of all filters in a list, a list version of union

`+++`. The combinator sticks lots of filters together.`($$) :: (a -> [b]) -> [a] -> [b]`Applies a filter to a list.

`processChildren :: TFilter node -> TFilter node`Applies a filter to the child list of a node.

**Choice combinators**

`orElse :: (a -> [b]) -> (a -> [b]) -> (a -> [b])`Directional choice. Second filter is only applied if the first one produces an empty list.

`(?>) :: (a -> [b]) -> ThenElse (a -> [b]) -> (a -> [b])`In combination with the type

`ThenElse a = a :> a`this combinator models an expression that resembles the conditional expression "`p ? f : g`" of C: "`p ?> f :> g`". If the predicate filter`p`is true, the filter`f`is applied otherwise`g`is applied.`when :: TFilter node -> TFilter node -> TFilter nod`First filter is only applied to the passed node, if the second filter produces a list with contents, otherwise a list with the passed node is returned. The second filter is typically a predicate filter.

Definition:

`f `when` g = g ?> f :> this``whenNot :: TFilter node -> TFilter node -> TFilter node`First filter is only applied to the passed node, if the second filter produces an empty list, otherwise a list of the passed node is returned. The second filter is typically a predicate filter.

Definition:

`f `whenNot` g = g ?> this :> f``guards :: TFilter node -> TFilter node -> TFilter node`First filter is only applied to the passed node, if the second filter produces a list with contents, otherwise an empty list is returned. The second filter is typically a predicate filter.

Definition:

`g `guards` f = g ?> f :> none``containing :: (a -> [b]) -> (b -> [c]) -> a -> [b]`Pruning: Keep only those nodes for which the second filter produces a list with contents and apply the first filter to these nodes.

Definition:

`f `containing` g = filter (not . null . g) . f``notContaining :: (a -> [b]) -> (b -> [c]) -> a -> [b]`Pruning: Keep only those nodes for which the second filter produces an empty list and apply the first filter to these nodes.

Definition:

`f `notContaining` g = filter (null . g) . f``(/>) :: TFilter node -> TFilter node -> TFilter node`Interior search, pronounced

*"slash"*. First filter is applied to a node, the children of the result are taken and the second filter is applied to them, these children are returned. (meaning g inside f)Definition:

`f /> g = g `o` getChildren `o` f``(</) :: TFilter node -> TFilter node -> TFilter node`Exterior search, pronounced

*"outside"*. First filter is applied to a node, the second to its children. If both filters return a result, the node is returned. (meaning f containing g)Definition:

`f </ g = f `containing` (g `o` getChildren)`

**Recursive search combinators**

`deep :: TFilter node -> TFilter node`Filter is applied to each node of the tree. If the filter returns a result, processing is stopped. If not, the filter is applied to the next node of the tree. (top down traversal)

`deepest :: TFilter node -> TFilter node`See

`deep`, but bottom up traversal.`multi :: TFilter node -> TFilter node`Returns all successful applications of the filter to the whole tree.

**Recursive transformation combinators**

`applyBottomUp :: TFilter node -> TFilter node`Constructs a new tree by applying the passed filter to each node of the tree. The tree is traversed bottom-up.

`applyTopDown :: TFilter node -> TFilter node`Constructs a new tree by applying the passed filter to each node of the tree. The tree is traversed top-down.

`applyBottomUpIfNot :: TFilter node -> TFilter node -> TFilter node`Constructs a new tree with a guarded top down transformation. The first filter is only applied to those nodes for which the second filter produces a result.

**Monadic composition**

`liftM :: Monad m => (a -> [b]) -> a -> m [b]`Lift a normal filter to a monadic filter so that it can be used in monads.

`(.<) :: Monad m => (b -> m [c]) -> (a -> m[b]) -> (a -> m[c])`Monadic Irish composition. Sequential composition of two filters. The first filter is applied to the result of the second filter (conjunction). Monadic version of the basic filter

`o`.`($$<) :: Monad m => (a -> m [b]) -> [a] -> m [b]`Apply a monadic filter to a list. Monadic version of the basic filter combinator

`cat`.

**Special Filter combinators for XmlTree**

`whenOk :: XmlFilter -> XmlFilter ->XmlFilter`Applies the first filter to a node and checks the result for errors. In case of an error it stops processing and returns the list with the

`XError`node. Otherwise processing is continued and the second filter is applied to the result.`whenOkM :: Monad m => (a -> XmlTrees) -> (XmlTrees -> m XmlTrees) -> a -> m XmlTrees`Monadic version of

`whenOk`.

Some combinators like the basic filter combinators and choice combinators can be expressed more naturally by operators. In Haskell completely new operators can be invented by using infix versions of functions and defining precedences and associativities for them.

The combinator operators are listed in following tables in decreasing order of their binding power. Combinators defined as prefix functions have to be written in back-quotes to be used as operators.