Module Xmlm

Streaming XML codec.

A well-formed sequence of signals represents an XML document tree traversal in depth first order (this has nothing to do with XML well-formedness). Input pulls a well-formed sequence of signals from a data source and output pushes a well-formed sequence of signals to a data destination. Functions are provided to easily transform sequences of signals to/from arborescent data structures.

Consult the features and limitations and examples of use.

v1.3.0 — homepage

References

Basic types and values

type encoding = [
| `UTF_8
| `UTF_16

Endianness determined from the BOM.

| `UTF_16BE
| `UTF_16LE
| `ISO_8859_1
| `US_ASCII
]

The type for character encodings. For `UTF_16, endianness is determined from the BOM.

type dtd = string option

The type for the optional DTD.

type name = string * string

The type for attribute and element's expanded names (uri,local). An empty uri represents a name without a namespace name, i.e. an unprefixed name that is not under the scope of a default namespace.

type attribute = name * string

The type for attributes. Name and attribute data.

type tag = name * attribute list

The type for an element tag. Tag name and attribute list.

type signal = [
| `Dtd of dtd
| `El_start of tag
| `El_end
| `Data of string
]

The type for signals. A well-formed sequence of signals belongs to the language of the doc grammar :

doc ::= `Dtd tree
tree ::= `El_start child `El_end
child ::= `Data trees | trees
trees ::= tree child | epsilon

The trees production is used to expresses the fact that there will never be two consecutive `Data signals in the children of an element.

Input and output deal only with well-formed sequences or exceptions are raised. However on output consecutive `Data signals are allowed.

val ns_xml : string

Namespace name value bound to the reserved "xml" prefix.

val ns_xmlns : string

Namespace name value bound to the reserved "xmlns" prefix.

val pp_dtd : Stdlib.Format.formatter -> dtd -> unit

pp_dtd ppf dtd prints an unspecified representation of dtd on ppf.

val pp_name : Stdlib.Format.formatter -> name -> unit

pp_name ppf name prints an unspecified representation of name on ppf.

val pp_attribute : Stdlib.Format.formatter -> attribute -> unit

pp_attribute ppf att prints an unspecified representation of att on ppf.

val pp_tag : Stdlib.Format.formatter -> tag -> unit

pp_tag ppf tag prints an unspecified representation of tag on ppf.

val pp_signal : Stdlib.Format.formatter -> signal -> unit

pp_signal ppf s prints an unspecified representation of s on ppf.

Input

type pos = int * int

The type for input positions. Line and column number, both start with 1.

type error = [
| `Max_buffer_size

Maximal buffer size exceeded (Sys.max_string_length).

| `Unexpected_eoi

Unexpected end of input.

| `Malformed_char_stream

Malformed underlying character stream.

| `Unknown_encoding of string

Unknown encoding.

| `Unknown_entity_ref of string

Unknown entity reference, details.

| `Unknown_ns_prefix of string

Unknown namespace prefix details

| `Illegal_char_ref of string

Illegal character reference.

| `Illegal_char_seq of string

Illegal character sequence.

| `Expected_char_seqs of string list * string

Expected one of the character sequences in the list but found another.

| `Expected_root_element

Expected the document's root element.

]

The type for input errors.

val error_message : error -> string

Converts the error to an english error message.

exception Error of pos * error

Raised on input errors.

type source = [
| `Channel of Stdlib.in_channel
| `String of int * string
| `Fun of unit -> int
]

The type for input sources. For `String starts reading at the given integer position. For `Fun the function must return the next byte as an int and raise End_of_file if there is no such byte.

type input

The type for input abstractions.

val make_input : ?⁠enc:encoding option -> ?⁠strip:bool -> ?⁠ns:(string -> string option) -> ?⁠entity:(string -> string option) -> source -> input

Returns a new input abstraction reading from the given source.

  • enc, character encoding of the document, details. Defaults to None.
  • strip, strips whitespace in character data, details. Defaults to false.
  • ns is called to bind undeclared namespace prefixes, details. Default returns always None.
  • entity is called to resolve non predefined entity references, details. Default returns always None.
val input : input -> signal

Inputs a signal. Repeated invocation of the function with the same input abstraction will generate a well-formed sequence of signals or an Error is raised. Furthermore there will be no two consecutive `Data signals in the sequence and their string is always non empty.

Deprecated After a well-formed sequence was input another may be input, see eoi and details.

Raises Error on input errors.

val input_tree : el:(tag -> 'a list -> 'a) -> data:(string -> 'a) -> input -> 'a

If the next signal is a :

  • `Data signal, inputs it and invokes data with the character data.
  • `El_start signal, inputs the sequence of signals until its matching `El_end and invokes el and data as follows

    • el, is called on each `El_end signals with the corresponding `El_start tag and the result of the callback invocation for the element's children.
    • data, is called on each `Data signals with the character data. This function won't be called twice consecutively or with the empty string.
  • Other signals, raises Invalid_argument.

Raises Error on input errors and Invalid_argument if the next signal is not `El_start or `Data.

val input_doc_tree : el:(tag -> 'a list -> 'a) -> data:(string -> 'a) -> input -> dtd * 'a

Same as input_tree but reads a complete well-formed sequence of signals.

Raises Error on input errors and Invalid_argument if the next signal is not `Dtd.

val peek : input -> signal

Same as Input but doesn't remove the signal from the sequence.

Raises Error on input errors.

val eoi : input -> bool

Returns true if the end of input is reached. See details.

Raises Error on input errors.

val pos : input -> pos

Current position in the input abstraction.

Output

type 'a frag = [
| `El of tag * 'a list
| `Data of string
]

The type for deconstructing data structures of type 'a.

type dest = [
| `Channel of Stdlib.out_channel
| `Buffer of Stdlib.Buffer.t
| `Fun of int -> unit
]

The type for output destinations. For `Buffer, the buffer won't be cleared. For `Fun the function is called with the output bytes as ints.

type output

The type for output abstractions.

val make_output : ?⁠decl:bool -> ?⁠nl:bool -> ?⁠indent:int option -> ?⁠ns_prefix:(string -> string option) -> dest -> output

Returns a new output abstraction writing to the given destination.

  • decl, if true the XML declaration is output (defaults to true).
  • nl, if true a newline is output when the root's element `El_end signal is output. Defaults to false.
  • indent, identation behaviour, see details. Defaults to None.
  • ns_prefix, undeclared namespace prefix bindings, see details. Default returns always None.
val output : output -> signal -> unit

Outputs a signal.

Deprecated. After a well-formed sequence of signals was output a new well-formed sequence can be output.

Raises Invalid_argument if the resulting signal sequence on the output abstraction is not well-formed or if a namespace name could not be bound to a prefix.

val output_depth : output -> int

output_depth o is o's current element nesting level (undefined before the first `El_start and after the last `El_end).

val output_tree : ('a -> 'a frag) -> output -> 'a -> unit

Outputs signals corresponding to a value by recursively applying the given value deconstructor.

Raises see Output.

val output_doc_tree : ('a -> 'a frag) -> output -> (dtd * 'a) -> unit

Same as output_tree but outputs a complete well-formed sequence of signals.

Raises see Output.

Functorial interface (deprecated)

type std_string = string
type std_buffer = Stdlib.Buffer.t
module type String = sig ... end

Input signature for strings.

module type Buffer = sig ... end

Input signature for internal buffers.

module type S = sig ... end

Output signature of Make.

module Make : functor (String : String) -> functor (Buffer : Buffer with type string = String.t) -> S with type string = String.t

Functor building streaming XML IO with the given strings and buffers.

Features and limitations

Input

Encoding

White space handling

Namespaces

Character and entity references

Sequences of documents (deprecated)

Miscellaneous

Output

Encoding

Namespaces

Indentation

Sequences of documents (deprecated)

Miscellaneous

Tips

Examples

Sequential processing

Tree processing

Tabular data processing