Design and Implementation of a validating XML parser in Haskell: Master's thesis; University of Applied Sciences Wedel | ||
---|---|---|
Prev | Chapter 5. Conclusion | Next |
The developed XML parser shows that the functional approach accomplishes the task of parsing and validating XML by using fewer lines of code and producing a very short and compact program in contrast to imperative languages. The packages hdom, hparser and hvalidator contain only about 9.000 lines of code including lots of HDoc [WWW30] comments.
Although the program is compact, the code is understandable and maintainable, because the code is more succinct and it follows a clear and simple design. The Haskell XML Toolbox introduced the very general tree data type XmlTree for representing whole XML documents in Haskell. This general data model makes it possible to base all processings of XML documents on filters. The whole XML parser, presented in this thesis, bases on this uniform design.
Writing a validating XML parser is a quite complex task. It must cope with different encodings, correct processing of entities and of course validation. Functional programming helps master this complexity better than other methods.
In Haskell functions are just values and have no side effects. These qualities allow an easy use of higher-order functions that take functions as arguments, return functions as a result or do both. The filter combinators, which have been adopted from HaXml, form a powerful library for combining filter functions. Because all filters of the Haskell XML Toolbox share the same type, it is possible to combine them freely with the use of filter combinators. All details of manipulating the XmlTree data structure are hidden in these higher-order functions. In effect these filter combinators define problem specific control structures that make it possible to program on a very high abstraction level. Errors can be reduced, because programmers can use the filter combinators as standard functions for processing the XmlTree.
Because functions are just values in Haskell, they can be constructed at runtime. The XML parser introduced in this thesis makes an extensive use of creating parameterized filter functions during runtime. The whole validation process bases on this design.
It can be quite useful to use functional programming paradigms when writing programs in imperative languages. Functional paradigms like higher-order functions for abstractions can be done in Java by defining an interface that has only one function. This interface can be passed to other functions or returned by a function. In C function pointers can be used for this task. But imperative languages are not designed to support functional programming styles, so they cannot actively support its paradigms. The main focus of these languages lies on the fact how a problem is solved, e.g. the order in which computations are performed.
Unfortunately functional programming and functional programming languages are not very popular. Two standard examples for functional programming are using a spreadsheet program like Excel and querying a database with SQL. Another field where functional programming dominates is transformation of SGML and XML documents. The Document Style Semantics and Specification Language (DSSSL) [WWW09] bases on the functional programming language Scheme and is very popular in the world of SGML publishing. The Extensible Stylesheet Language (XSL) [WWW07] is its corresponding part for XML publishing. Functional programming languages are very well equipped for this task, because the transformation process is a functional mapping from a structural document as input to a formatted representation as output.
Sometimes it is said that functional programming languages lack of libraries. We do not know any validating XML Parser written in Haskell and hope that the framework of the Haskell XML Toolbox will be a useful tool for XML processing applications written in Haskell. The parser supports almost fully the XML 1.0 specification with the exception of namespaces.
The Haskell XML Toolbox introduces a powerful approach for processing XML in Haskell. It generalizes the ideas of HaXml and HXML. Whole XML documents are represented as a tree of different nodes. This tree can be processed in a uniform way by using filter functions and filter combinators.
Lots of great ideas of the projects HaXml and HXML have been taken into this project. We want to thank their members for their great work and emphasis that all three projects are enrichments for the Haskell community.
The Haskell XML Toolbox project will be maintained and enlarged at the University of Applied Sciences Wedel. One student already wrote an XSLT processor on the basis of this project. Another student is writing a program using the Haskell XML Toolbox for deriving Java classes from DTDs.