Segment
The type for custom segmenters. See custom.
The type for boundaries.
`Grapheme_clusterdetermines extended grapheme clusters boundaries according to UAX 29 (corresponds, for most scripts, to user-perceived characters).`Worddetermines word boundaries according to UAX 29.`Sentencedetermines sentence boundaries according to UAX 29.`Line_breakdetermines mandatory line breaks and line break opportunities according to UAX 14.
val pp_boundary : Stdlib.Format.formatter -> boundary -> unitpp_boundary ppf b prints an unspecified representation of b on ppf.
The type for segmenter results. See add.
val add : t -> [ `Uchar of Stdlib.Uchar.t | `Await | `End ] -> retadd s v is:
`Boundaryif there is a boundary at that point in the sequence of characters. The client must then calladdwith`Awaituntil`Awaitis returned.`Uchar uifuis the next character in the sequence. The client must then calladdwith`Awaituntil`Awaitis returned.`Awaitwhen the segmenter is ready to add a new`Ucharor`End.`Endwhen`Endwas added and all`Boundaryand`Ucharwere output.
For v use `Uchar u to add a new character to the sequence to segment and `End to signal the end of sequence. After adding one of these two values always call add with `Await until `Await or `End is returned.
- raises Invalid_argument
if
`Ucharor`Endis added while that last add did not return`Awaitor if an`Ucharor`Endis added after an`Endwas already added.
val mandatory : t -> boolmandatory s is true if the last `Boundary returned by add was mandatory. This function only makes sense for `Line_break segmenters or `Custom segmenters that sport that notion. For other segmenters or if no `Boundary was returned so far, true is returned.
copy s is a copy of s in its current state. Subsequent adds on s do not affect the copy.
val pp_ret : Stdlib.Format.formatter -> [< ret ] -> unitpp_ret ppf v prints an unspecified representation of v on ppf.
Custom segmenters
val custom : ?mandatory:('a -> bool) -> name:string -> create:(unit -> 'a) -> copy:('a -> 'a) -> add:('a -> [ `Uchar of Stdlib.Uchar.t | `Await | `End ] -> ret) -> unit -> customcreate ~mandatory ~name ~create ~copy ~add is a custom segmenter.
nameis a name to identify the segmenter.createis called when the segmenter is created it should return a custom segmenter value.copyis called with the segmenter value whenever the segmenter is copied. It should return a copy of the segmenter value.mandatoryis called with the segmenter value to define the result of themandatoryfunction. Defaults always returnstrue.addis called with the segmenter value to define the result of theaddvalue. The returned value should respect the semantics ofadd. Use the functionserr_exp_awaitanderr_endedto raiseInvalid_argumentexception inadds error cases.
val err_exp_await : [< ret ] -> 'aerr_exp_await fnd should be used by custom segmenters when the client tries to add an `Uchar or `End while the last returned value was not an `Await.
val err_ended : [< ret ] -> 'aerr_ended () should be used by custom segmenter when the client tries to add `Uchar or `End after `End was already added.