Design and Implementation of a validating XML parser in Haskell: Master's thesis; University of Applied Sciences Wedel | ||
---|---|---|
Prev | Chapter 4. Package hvalidator | Next |
Validation of the DTD is done by the module DTDValidation. Notations, unparsed entities, element declarations and attribute declarations are checked if they correspond to the constraints of the XML 1.0 specification [WWW01].
The following checks are performed:
DTD
Error: There exists no DOCTYPE declaration in the document.
Notations
Error: A notation was already specified.
Unparsed entities:
Warning: Entity is declared multiple times. First declaration is used.
Error: A referenced notations must be declared
Element declarations:
Error: Element is declared more than once
Error: Element was specified multiple times in a content model of type mixed content.
Warning: Element used in a content model is not declared.
Attribute declarations:
Warning: There exists no element declaration for the attribute declaration.
Warning: Attribute is declared multiple times. First declaration will be used.
Warning: Same Nmtoken should not occur more than once in enumerate attribute types.
Error: Element already has an attribute of type ID. A second attribute of type ID is not permitted.
Error: ID attribute must have a declared default of #IMPLIED or #REQUIRED.
Error: Element already has an attribute of type NOTATION. A second attribute of type NOTATION is not permitted.
Error: Attribute of type NOTATION must not be declared on an element declared EMPTY.
Error: A notation must be declared when referenced in the notation type list for an attribute.
Error: The declared default value must meet the lexical constraints of the declared attribute type.
Each check is done by a separate function, which takes the child list of the XDTD DOCTYPE node as input and returns a list of errors. Some functions can optionally take some further arguments to have access to context information, e.g. when validating unparsed entities, a list of all defined notations is needed. The result of validating the DTD is a concatenated list of the results of all validation functions.
The following example shows a filter function for checking the validity constraint: "No Notation on Empty Element" (section 3.3.1 in XML 1.0 specification [WWW01]). It means that an attribute of type NOTATION must not be declared for an element declared EMPTY. A notation attribute is a way to give an application a clue how the content of an element should be processed. The notation might refer to a program that can process the content, e.g. a base-64 encoded JPEG. Because empty elements cannot have contents, attributes of type notation are forbidden. The function checkNoNotationForEmptyElement is initialized with a list of all element names declared EMPTY. The constructed filter is then applied to all XDTD ATTLIST nodes of type NOTATION that have been selected from the DTD by another filter function.
Example 4-2. Validation that notations are not declared for EMPTY elements
checkNoNotationForEmptyElement :: [String] -> XmlFilter checkNoNotationForEmptyElement emptyElems nd@(NTree (XDTD ATTLIST al) _) = if elemName `elem` emptyElems then err ("Attribute \""++ attName ++"\" of type NOTATION must not be "++ "declared on the element \""++ elemName ++"\" declared EMPTY.") nd else [] where elemName = getAttrValue1 a_name al attName = getAttrValue1 a_value al checkNoNotationForEmptyElement _ nd = error ("checkNoNotationForEmptyElement: illegal parameter:\n" ++ show nd) |
The validation functions cannot check if a content model is deterministic. This requirement is for compatibility with SGML, because some SGML tools can rely on unambiguous content models. XML processors may flag such content models as errors, but the Haskell XML Toolbox does not. It does not need deterministic content models for checking if the children of an element are valid (see Section 4.7).