Design and Implementation of a validating XML parser in Haskell: Master's thesis; University of Applied Sciences Wedel | ||
---|---|---|
Prev | Chapter 3. Package hparser | Next |
The module XmlParser provides the parse functions for parsing XML files and building an XmlTree. The parser bases on Parsec [WWW29], a free monadic parser combinator library for Haskell and does not need any look-ahead. The lexer and the parser are not separated. A feature of this parser is that nearly all parse functions can be implemented as it is defined by the productions in the XML 1.0 specification [WWW01].
Like filters are composed with filter combinators in the Haskell XML Toolbox, parsing is usually done with parser combinators in Haskell. Simple parser functions are combined with these higher-order functions to complex ones. The parser combinators are control structures that represent the operators used it the productions of syntax definitions.
An XML parser has to deal with different character sets. The parser of the Haskell XML Toolbox works internally with Unicode (UTF-8) encoding. The module Unicode provides several conversion functions for different character sets.
Supported character sets by the XmlParser:
UTF-8
ISO-8859-1
US-ASCII
ISO-10646-UCS-2
UTF-16
UTF-16BE
UTF-16LE