Jsonm is a non-blocking streaming codec to decode and encode the JSON data format. It can process JSON text without blocking on IO and without a complete in-memory representation of the data.
The uncut codec also processes whitespace and (non-standard) JSON with JavaScript comments.
Consult the data model, limitations and examples of use.
The type for JSON lexemes. `As and `Ae start and end arrays and `Os and `Oe start and end objects. `Name is for the member names of objects.
A well-formed sequence of lexemes belongs to the language of the json grammar:
json = value
object = `Os *member `Oe
member = (`Name s) value
array = `As *value `Ae
value = `Null / `Bool b / `Float f / `String s / object / array
A decoder returns only well-formed sequences of lexemes or `Errors are returned. The UTF-8, UTF-16, UTF-16LE and UTF-16BE encoding schemes are supported. The strings of decoded `Name and `String lexemes are however always UTF-8 encoded. In these strings, characters originally escaped in the input are in their unescaped representation.
An encoder accepts only well-formed sequences of lexemes or Invalid_argument is raised. Only the UTF-8 encoding scheme is supported. The strings of encoded `Name and `String lexemes are assumed to be immutable and must be UTF-8 encoded, this is not checked by the module. In these strings, the delimiter characters U+0022 and U+005C ('"', '\') aswell as the control characters U+0000-U+001F are automatically escaped by the encoders, as mandated by the standard.
pp_lexeme ppf l prints a unspecified non-JSON representation of l on ppf.
Decode
type error = [
| `Illegal_BOM
| `Illegal_escape of[ `Not_hex_uchar of Stdlib.Uchar.t| `Not_esc_uchar of Stdlib.Uchar.t| `Not_lo_surrogate of int| `Lone_lo_surrogate of int| `Lone_hi_surrogate of int ]
decoder encoding src is a JSON decoder that inputs from src. encoding specifies the character encoding of the data. If unspecified the encoding is guessed as suggested by the old RFC4627 standard.
val decode : decoder->[> `Await | `Lexeme of lexeme| `End| `Error of error ]
decode d is:
`Await if d has a `Manual source and awaits for more input. The client must use Manual.src to provide it.
`Lexeme l if a lexeme l was decoded.
`End if the end of input was reached.
`Error e if a decoding error occured. If the client is interested in a best-effort decoding it can still continue to decode after an error (see Error recovery) although the resulting sequence of `Lexemes is undefined and may not be well-formed.
The Uncut.pp_decode function can be used to inspect decode results.
Note. Repeated invocation always eventually returns `End, even in case of errors.
val decoded_range : decoder->(int * int) * (int * int)
decoded_range d is the range of characters spanning the last `Lexeme or `Error (or `White or `Comment for an Uncut:decode) decoded by d. A pair of line and column numbers respectively one and zero based.
encoder minify dst is an encoder that outputs to dst. If minify is true (default) the output is made as compact as possible, otherwise the output is indented. If you want better control on whitespace use minify = true and Uncut:encode.
val encode : encoder->[< `Await | `End| `Lexeme of lexeme ]->[ `Ok | `Partial ]
encode e v is:
`Partial iff e has a `Manual destination and needs more output storage. The client must use Manual.dst to provide a new buffer and then call Encode with `Await until `Ok is returned.
`Ok when the encoder is ready to encode a new `Lexeme or `End.
For `Manual destinations, encoding `End always returns `Partial, the client should as usual use Manual.dst and continue with `Await until `Ok is returned at which point Manual.dst_reme is guaranteed to be the size of the last provided buffer (i.e. nothing was written).
Raises.Invalid_argument if a non well-formed sequence of lexemes is encoded or if `Lexeme or `End is encoded after a `Partial encode.