Segment
The type for custom segmenters. See custom
.
The type for boundaries.
`Grapheme_cluster
determines extended grapheme clusters boundaries according to UAX 29 (corresponds, for most scripts, to user-perceived characters).`Word
determines word boundaries according to UAX 29.`Sentence
determines sentence boundaries according to UAX 29.`Line_break
determines mandatory line breaks and line break opportunities according to UAX 14.
val pp_boundary : Stdlib.Format.formatter -> boundary -> unit
pp_boundary ppf b
prints an unspecified representation of b
on ppf
.
The type for segmenter results. See add
.
val add : t -> [ `Uchar of Stdlib.Uchar.t | `Await | `End ] -> ret
add s v
is:
`Boundary
if there is a boundary at that point in the sequence of characters. The client must then calladd
with`Await
until`Await
is returned.`Uchar u
ifu
is the next character in the sequence. The client must then calladd
with`Await
until`Await
is returned.`Await
when the segmenter is ready to add a new`Uchar
or`End
.`End
when`End
was added and all`Boundary
and`Uchar
were output.
For v
use `Uchar u
to add a new character to the sequence to segment and `End
to signal the end of sequence. After adding one of these two values always call add
with `Await
until `Await
or `End
is returned.
- raises Invalid_argument
if
`Uchar
or`End
is added while that last add did not return`Await
or if an`Uchar
or`End
is added after an`End
was already added.
val mandatory : t -> bool
mandatory s
is true
if the last `Boundary
returned by add
was mandatory. This function only makes sense for `Line_break
segmenters or `Custom
segmenters that sport that notion. For other segmenters or if no `Boundary
was returned so far, true
is returned.
copy s
is a copy of s
in its current state. Subsequent add
s on s
do not affect the copy.
val pp_ret : Stdlib.Format.formatter -> [< ret ] -> unit
pp_ret ppf v
prints an unspecified representation of v
on ppf
.
Custom segmenters
val custom : ?mandatory:('a -> bool) -> name:string -> create:(unit -> 'a) -> copy:('a -> 'a) -> add:('a -> [ `Uchar of Stdlib.Uchar.t | `Await | `End ] -> ret) -> unit -> custom
create ~mandatory ~name ~create ~copy ~add
is a custom segmenter.
name
is a name to identify the segmenter.create
is called when the segmenter is created it should return a custom segmenter value.copy
is called with the segmenter value whenever the segmenter is copied. It should return a copy of the segmenter value.mandatory
is called with the segmenter value to define the result of themandatory
function. Defaults always returnstrue
.add
is called with the segmenter value to define the result of theadd
value. The returned value should respect the semantics ofadd
. Use the functionserr_exp_await
anderr_ended
to raiseInvalid_argument
exception inadd
s error cases.
val err_exp_await : [< ret ] -> 'a
err_exp_await fnd
should be used by custom segmenters when the client tries to add
an `Uchar
or `End
while the last returned value was not an `Await
.
val err_ended : [< ret ] -> 'a
err_ended ()
should be used by custom segmenter when the client tries to add
`Uchar
or `End
after `End
was already added.