Module Jsonm

Non-blocking streaming JSON codec.

Jsonm is a non-blocking streaming codec to decode and encode the JSON data format. It can process JSON text without blocking on IO and without a complete in-memory representation of the data.

The uncut codec also processes whitespace and (non-standard) JSON with JavaScript comments.

Consult the data model, limitations and examples of use.

v1.0.1 - homepage

References

JSON data model

type lexeme = [
| `Null
| `Bool of bool
| `String of string
| `Float of float
| `Name of string
| `As
| `Ae
| `Os
| `Oe
]

The type for JSON lexemes. `As and `Ae start and end arrays and `Os and `Oe start and end objects. `Name is for the member names of objects.

A well-formed sequence of lexemes belongs to the language of the json grammar:

  json = value 
object = `Os *member `Oe
member = (`Name s) value
 array = `As *value `Ae
 value = `Null / `Bool b / `Float f / `String s / object / array

A decoder returns only well-formed sequences of lexemes or `Errors are returned. The UTF-8, UTF-16, UTF-16LE and UTF-16BE encoding schemes are supported. The strings of decoded `Name and `String lexemes are however always UTF-8 encoded. In these strings, characters originally escaped in the input are in their unescaped representation.

An encoder accepts only well-formed sequences of lexemes or Invalid_argument is raised. Only the UTF-8 encoding scheme is supported. The strings of encoded `Name and `String lexemes are assumed to be immutable and must be UTF-8 encoded, this is not checked by the module. In these strings, the delimiter characters U+0022 and U+005C ('"', '\') aswell as the control characters U+0000-U+001F are automatically escaped by the encoders, as mandated by the standard.

val pp_lexeme : Stdlib.Format.formatter -> [< lexeme ] -> unit

pp_lexeme ppf l prints a unspecified non-JSON representation of l on ppf.

Decode

type error = [
| `Illegal_BOM
| `Illegal_escape of [ `Not_hex_uchar of Stdlib.Uchar.t | `Not_esc_uchar of Stdlib.Uchar.t | `Not_lo_surrogate of int | `Lone_lo_surrogate of int | `Lone_hi_surrogate of int ]
| `Illegal_string_uchar of Stdlib.Uchar.t
| `Illegal_bytes of string
| `Illegal_literal of string
| `Illegal_number of string
| `Unclosed of [ `As | `Os | `String | `Comment ]
| `Expected of [ `Comment | `Value | `Name | `Name_sep | `Json | `Eoi | `Aval of bool | `Omem of bool ]
]
val pp_error : Stdlib.Format.formatter -> [< error ] -> unit

pp_error e prints an unspecified UTF-8 representation of e on ppf.

type encoding = [
| `UTF_8
| `UTF_16
| `UTF_16BE
| `UTF_16LE
]

The type for Unicode encoding schemes.

type src = [
| `Channel of Stdlib.in_channel
| `String of string
| `Manual
]

The type for input sources. With a `Manual source the client must provide input with Manual.src.

type decoder

The type for JSON decoders.

val decoder : ?⁠encoding:[< encoding ] -> [< src ] -> decoder

decoder encoding src is a JSON decoder that inputs from src. encoding specifies the character encoding of the data. If unspecified the encoding is guessed as suggested by the old RFC4627 standard.

val decode : decoder -> [> `Await | `Lexeme of lexeme | `End | `Error of error ]

decode d is:

  • `Await if d has a `Manual source and awaits for more input. The client must use Manual.src to provide it.
  • `Lexeme l if a lexeme l was decoded.
  • `End if the end of input was reached.
  • `Error e if a decoding error occured. If the client is interested in a best-effort decoding it can still continue to decode after an error (see Error recovery) although the resulting sequence of `Lexemes is undefined and may not be well-formed.

The Uncut.pp_decode function can be used to inspect decode results.

Note. Repeated invocation always eventually returns `End, even in case of errors.

val decoded_range : decoder -> (int * int) * (int * int)

decoded_range d is the range of characters spanning the last `Lexeme or `Error (or `White or `Comment for an Uncut:decode) decoded by d. A pair of line and column numbers respectively one and zero based.

val decoder_encoding : decoder -> encoding

decoder_encoding d is d's encoding.

Warning. If the decoder guesses the encoding, rely on this value only after the first `Lexeme was decoded.

val decoder_src : decoder -> src

decoder_src d is d's input source.

Encode

type dst = [
| `Channel of Stdlib.out_channel
| `Buffer of Stdlib.Buffer.t
| `Manual
]

The type for output destinations. With a `Manual destination the client must provide output storage with Manual.dst.

type encoder

The type for JSON encoders.

val encoder : ?⁠minify:bool -> [< dst ] -> encoder

encoder minify dst is an encoder that outputs to dst. If minify is true (default) the output is made as compact as possible, otherwise the output is indented. If you want better control on whitespace use minify = true and Uncut:encode.

val encode : encoder -> [< `Await | `End | `Lexeme of lexeme ] -> [ `Ok | `Partial ]

encode e v is:

  • `Partial iff e has a `Manual destination and needs more output storage. The client must use Manual.dst to provide a new buffer and then call Encode with `Await until `Ok is returned.
  • `Ok when the encoder is ready to encode a new `Lexeme or `End.

For `Manual destinations, encoding `End always returns `Partial, the client should as usual use Manual.dst and continue with `Await until `Ok is returned at which point Manual.dst_rem e is guaranteed to be the size of the last provided buffer (i.e. nothing was written).

Raises. Invalid_argument if a non well-formed sequence of lexemes is encoded or if `Lexeme or `End is encoded after a `Partial encode.

val encoder_dst : encoder -> dst

encoder_dst e is e's output destination.

val encoder_minify : encoder -> bool

encoder_minify e is true if e's output is minified.

Manual sources and destinations

module Manual : sig ... end

Manual input sources and output destinations.

Uncut codec

module Uncut : sig ... end

Codec with comments and whitespace.

Limitations

Decode

Encode

Error recovery

Examples

Trip

Member selection

Generic JSON representation