Design and Implementation of a validating XML parser in Haskell: Master's thesis; University of Applied Sciences Wedel | ||
---|---|---|
Prev | Chapter 4. Package hvalidator | Next |
After validation of the DTD the document has to be validated. By validating the document, validation of the elements and their attributes is meant. This process is split into two phases: validation of the elements and their attributes and validation of attributes of types ID, IDREF and IDREFS.
Involved modules
Validation of the elements and their attributes. Validation of content models is delegated to the module XmlRE.
Validation of attributes of types ID, IDREF and IDREFS.
Before the document is validated, a lookup-table is built on the basis of the DTD. This table maps element names to their validation functions. The validation function for an element is a set of XmlFilter functions, which take the current element as input and return a list of errors. If the element meets the validation constraints, an empty list is returned. Each validation constraint, listed in the following, is represented by a single filter. All these filters are combined by the combinator +++, which concatenates the results of filters, each filter uses copy of state. This approach for validation is very flexible. It is very easy to add new functions without affecting the other ones.
After the initialization phase of building the validation filters and the lookup table, the whole document is traversed in preorder and every element is validated by its validation filter.
Validation of elements:
Error: Element is not declared in DTD.
Error: Root element must match value declared in DOCTYPE.
Error: Children of element do not match its content model.
Validation of attributes:
Error: Attribute is not declared.
Error: Attribute is specified multiple times.
Error: Attribute is of type #REQUIRED, but was not specified.
Error: Attribute is of type #FIXED, but specified value differs from fixed value.
Error: Value of attribute does not match the lexical constraints of its type.
The module IdValidation provides functions for checking special ID/IDREF/IDREFS constraints. First it is checked if all ID values are unique. All nodes with ID attributes are collected from the document, then it is validated that values of ID attributes do not occur more than once. During a second iteration over the document it is validated that all IDREF/IDREFS values match the value of some ID attribute. For both checks a lookup-table, which maps element names to their validation functions, is built like it is done in the module DocValidation.
Validation of ID/IDREF/IDREFS:
Error: Value of type ID must be unique within the document.
Error: An attribute of type IDREF/IDREFS references an identifier that does not exist in the document.
The following example shows a filter for validating that all attributes of an element are unique (Unique AttSpec, section 3.1 XML 1.0 specification [WWW01]). It is a filter, which takes an element - an NTree XTag node - checks if its attributes are unique, and returns a list of errors or an empty list.
Example 4-3. Validation that all attributes are unique
noDoublicateAttributes :: XmlFilter noDoublicateAttributes n@(NTree (XTag name al) _) = doubles $ reverse al where doubles :: TagAttrl -> XmlTrees doubles ((attrName,_):xs) = if (lookup attrName xs) == Nothing then doubles xs else (err ("Attribute \""++ attrName ++"\" was already specified "++ "for element \"++ name ++"\".") n) ++ doubles xs doubles [] = [] noDoublicateAttributes n = error ("noDoublicateAttributes: illegal parameter:\n" ++ show n) |