Design and Implementation of a validating XML parser in Haskell: Master's thesis; University of Applied Sciences Wedel | ||
---|---|---|
Prev | Chapter 2. Package hdom | Next |
Filters are the basic functions for processing the XmlTree representation of XML documents. A filter takes a node or a list of nodes and returns some sequence of nodes. The result list might be empty, might contain a single item, or it could contain several items.
The idea of filters was adopted from HaXml [WWW21], but has been modified. In HaXml filters work only on the document subset part of XML documents. The Haskell XML Toolbox uses the generic tree data type NTree for modeling XML documents. This generic data model makes it possible to generalize HaXml's filter idea so that filters can process the whole XML document, including the DTD subset or document subset. This generalization allows implementing a very uniform design of XML processing applications by using filters. In fact the whole XML Parser of the Haskell XML Toolbox works internally with filters. The differences between HaXml's approach and the approach of the Haskell XML Toolbox are described in depth in Section 5.2.
TFilter and TSFilter are filters for the general n-ary tree defined by the data type NTree. The function TFilter takes a node and returns a list of nodes. TSFilter takes a list of nodes and returns a list, too.
type TFilter node = NTree node -> NTrees node type TSFilter node = NTrees node -> NTrees node |
XmlFilter and XmlSFilter base on these types. They only work on XNode data types.
type XmlFilter = TFilter XNode type XmlSFilter = TSFilter XNode |
The filters can be used to select parts of a document, to construct new document parts or to change document parts. They can even be used for checking validity constraints as described in Chapter 4. In this case a filter returns an empty list if the document is valid or a list with errors.
Filters can sometimes be thought of as predicates. In this case they are used for deciding whether or not to keep its input. The functional approach differs from predicate logic. If the predicate is false, an empty list is returned. If the predicate is true, a list with the passed element is returned.
All filters share the same basic type so that combining them with the help of combinators, described in Section 2.4, is possible. With this approach defining complex filters on the basis of easier ones is possible.
The following list describes the basic filter functions for processing XML documents represented as an XmlTree. Some functions are higher-order functions and return a filter function as a result. The arguments of these functions are used to construct parameterized filters. This is useful for example for constructing filters that should be used to return nodes with a certain property.
Simple filters
Takes any node, returns always an empty list. Algebraically zero.
Takes any node, returns always a list of the passed node. Algebraically unit.
Selection filters
Takes a predicate functions and returns a filter. The filter returns a list with passed node if the predicate function is true for the node, otherwise it returns an empty list.
The same as isOfNode. Instead of a predicate function a reference node is taken.
Filters for modifying nodes
Takes a node and returns a filter. The filter returns always a list with this node and ignores the passed one.
Takes a node and returns a filter. The filter replaces a passed node with the initialized one. The children of the passed node are added to the new node.
Like replaceNode except that in this case the children are replaced by an initialized list. The passed node itself is not modified.
Takes a function for modifying nodes and returns a filter. The function (node -> Maybe node) is applied to the passed node, the children are not modified.
Like modifyNode0 except that the type of the modification function (node -> node) is different.
Takes a filter that processes lists of nodes and returns a new filter. The new filter applies the filter TSFilter node to the child list of a passed node. The node itself is not modified.
Predicate filters
Takes a TagName and returns a filter. The filter returns a list with the passed node if its name equals TagName, otherwise an empty list is returned.
Takes a predicate function (TagName -> Bool) and returns a filter. The filter applies the predicate function to the name of a passed node. If the predicate function is true, the filter returns a list with the node. Otherwise an empty list is returned.
Constructs a predicate filter for attributes which value meets a predicate function. The constructed filter returns a list with the passed node if the node has an attribute with name AttrName and its value matches the predicate function (AttrValue -> Bool). Otherwise an empty list is returned.
Lots of further predicate functions are provided by the module XmlTreePredicates: isXCdata, isXCharRef, isXCmt, isXDTD, isXEntityRef, isXError, isXNoError, isXPi, isXTag, isXText, etc. These filters are used for identifying special types of nodes.
Construction filters
The created filter constructs an XTag node with the name TagName, an attribute list TagAttrl and a list of children. The passed node is ignored by the filter.
The created filter constructs an XText node with text data. The passed node is ignored by the filter. There exists a shortcut function txt that does the same.
The created filter constructs an XCharRef node with a reference number to a character. The passed node is ignored by the filter.
The created filter constructs an XEntityRef node with an entity reference. The passed node is ignored by the filter.
The created filter constructs an XCmt node with text data. The passed node is ignored by the filter. There exists a shortcut function cmt that does the same.
The created filter constructs an XDTD node. The type of the node is specified by the algebraic data type DTDElem. The node has attributes and a list of children. The passed node is ignored by the filter.
The created filter constructs an XPi node with a name and attributes. The passed node is ignored by the filter.
The created filter constructs an XCdata node with text data. The passed node is ignored by the filter.
The created filter constructs an XError node with an error level and an error message. The passed node is stored in the child list of this error node, so that the location where the error occurred can be preserved. The shortcut functions warn, err and fatal of type String -> XmlFilter can be used to create specific error nodes.
The created filter constructs an XTag node with the name TagName and the attribute list TagAttrl. Its child list is constructed by applying the filter list [XmlFilter] to the passed node. There exists a shortcut function tag that does the same.
The created filter constructs a simple XTag node. It works like mkXElem except that no attribute list is created. There exists a shortcut function stag that does the same.
The created filter constructs an empty XTag node. It works like mkXSElem except that no child list is created. There exists a shortcut function etag that does the same.
Selection filters
If the passed node is of type XTag, a list with an XText node is returned. This node contains the name of the element. Otherwise an empty list is returned.
If the passed node is of type XTag and there exists an attribute with the name AttrName, a list with an XText node is returned. This node contains the value of the attribute. Otherwise an empty list is returned.
The same as getXTagAttr except that it works on XDTD nodes.
If the passed node is of type XText, a list with an XText node is returned. This node contains text data. Otherwise an empty list is returned.
If the passed node is of type XCmt, a list with an XText node is returned. This node contains the text data of the comment. Otherwise an empty list is returned.
If the passed node is of type XPi, a list with an XText node is returned. This node contains the name of the processing instruction. Otherwise an empty list is returned.
If the passed node is of type XCdata, a list with an XText node is returned. This node contains the text data of the element. Otherwise an empty list is returned.
If the passed node is of type XError, a list with an XText node is returned. This node contains the error message. Otherwise an empty list is returned.
Substitution filters
Constructed filter replaces the name of an XTag or XPi node by the TagName and returns a list with the modified node.
Constructed filter replaces the attribute list of an XTag, XDTD or XPi node by the TagAttrl and returns a list with the modified node.
Constructed filter modifies the name of an XTag or XPi node by applying the function (TagName -> TagName) to the name. The filter returns a list with the modified node.
Constructed filter modifies the attribute list of an XTag, XDTD or XPi node by applying the function (TagAttrl -> TagAttrl) to the attribute list. The filter returns a list with the modified node.
Constructed filter changes the attribute value of the attribute which name equals AttrName to AttrValue and returns a list with the modified node.