University of Cambridge


Static typing for XML processing

By Haruo Hosoya (Research Institute for Mathematical Sciences, Kyoto University), Benjamin Pierce, Jerome Vouillon, and Makoto Murata.

Tree regular expressions are a simple yet powerful framework to describe structural constraints on trees. This framework is also mathematically clean. In particular, its correspondence to finite tree automata is well known and thereby various desirable properties for them, such as closure under boolean operations, immediately carry over to tree regular expressions. These facts have attracted to some XML developpers and led to recently emerging schema languages such as RELAX NG.

Our work is to build up, on top of this firm ground, a statically typed language for XML processing. Specifically, we adopt tree regular expressions as types, extend these to a pattern matching facility, and perform type-checking on tree manipulating operations by exploiting properties on tree regular expressions. Notably, we use as subtyping the language inclusion between tree regular expressions, which is literally the most powerful since it allows all subtype relations that are semantically sound. We also use boolean operations such as intersection and difference for analyses on pattern matches. The resulting type system is simple, elegant, and expressive and thereby should be easy for the user to understand. I also mention our recent efforts for incorporating the attribute typing mechanism of RELAX NG to our type system.