Module `B0_std.Conv`

Value converters.

A value converter describes how to encode and decode OCaml values to a binary presentation and a textual, human specifiable, s-expression based, representation.

Notation. Given a value v and a converter c we write [v]_c the textual encoding of v according to c.

Low-level encoders and decoders

exception Error of int * int * string

The exception for conversion errors. This exception is raised both by encoders and decoders with raise_notrace. The integers indicates a byte index range in the input on decoding errors, it is meaningless on encoding ones.

Note. This exception is used for defining converters. High-level converting functions do not raise but use result values to report errors.

module Bin : sig ... end: Binary codecs.

module Txt : sig ... end: Textual codecs

Converters

type 'a t: The type for converters.

val v : kind:string -> docvar:string -> 'a Bin.enc -> 'a Bin.dec -> 'a Txt.enc -> 'a Txt.dec -> 'a t: v ~kind ~docvar bin_enc bin_dec txt_enc txt_dec is a value converter using bin_enc, bin_dec, txt_enc, txt_dec for binary and textual conversions. kind documents the kind of converted value and docvar a meta-variable used in documentation to stand for these values (use uppercase e.g. INT for integers).

val kind : 'a t -> string: kind c is the documented kind of value converted by c.

val docvar : 'a t -> string: docvar c is the documentation meta-variable for values converted by c.

val bin_enc : 'a t -> 'a Bin.enc: bin_enc c is the binary encoder of c.

val bin_dec : 'a t -> 'a Bin.dec: bin_dec c is the binary decoder of c.

val txt_enc : 'a t -> 'a Txt.enc: txt_enc c is the textual encoder of c.

val txt_dec : 'a t -> 'a Txt.dec: txt_dec c is the textual decoder of c.

val with_kind : ?⁠docvar:string -> string -> 'a t -> 'a t: with_kind ~docvar k c is c with kind k and documentation meta-variable docvar (defaults to docvar c).

val with_docvar : string -> 'a t -> 'a t: with_docvar docvar c is c with documentation meta-variable docvar.

val with_conv : kind:string -> docvar:string -> ('b -> 'a) -> ('a -> 'b) -> 'a t -> 'b t: with_conv ~kind ~docvar to_t of_t t_conv is a converter for type 'b given a converter t_conv for type 'a and conversion functions from and to type 'b. The conversion functions should raise Error if they are not total.

Converting

val to_bin : ?⁠buf:Stdlib.Buffer.t -> 'a t -> 'a -> (string, string) Stdlib.result: to_bin c v binary encodes v using c. buf is used as the internal buffer if specified (it is Buffer.cleared before usage).

val of_bin : 'a t -> string -> ('a, string) Stdlib.result: of_bin c s binary decodes a value from s using c.

val to_txt : ?⁠buf:Stdlib.Buffer.t -> 'a t -> 'a -> (string, string) Stdlib.result: to_txt c v textually encodes v using c. buf is used as the internal buffer if specified (it is Buffer.cleared before usage).

val of_txt : 'a t -> string -> ('a, string) Stdlib.result: of_txt c s textually decodes a value from s using c.

val to_pp : 'a t -> 'a Fmt.t: to_pp c is a formatter using to_txt to format values. Any error that might occur is printed in the output using the s-expression (conv-error [c]_kind [e]) with [c]_kind the atom for the value kind c and [e] the atom for the error message.

Predefined converters

val bool : bool t: bool converts booleans. Textual conversions represent booleans with the atoms true and false.

val byte : int t: byte converts a byte. Textual decoding parses an atom according to the syntax of int_of_string. Conversions fail if the integer is not in the range [0;255].

val int : int t

int converts signed OCaml integers. Textual decoding parses an atom according to the syntax of int_of_string. Conversions fail if the integer is not in the range [-2^{Sys.int_size-1};2^{Sys.int_size-1}-1].

Warning. A large integer encoded on a 64-bit platform may fail to decode on a 32-bit platform, use int31 or int64 if this is a problem.

val int31 : int t: int31 converts signed 31-bit integers. Textual decoding parses an atom according to the syntax of int_of_string. Conversions fail if the integer is not in the range [-2³⁰;2³⁰-1].

val int32 : int32 t: int32 converts signed 32-bit integers. Textual decoding parses an atom according to the syntax of Int32.of_string. Conversions fail if the integer is not in the range [-2³¹;2³¹-1].

val int64 : int64 t: int64 converts signed 64-bit integers. Textual decoding parses an atom according to the syntax of Int64.of_string. Conversions fail if the integer is not in the range [-2⁶³;2⁶³-1].

val float : float t: float converts floating point numbers. Textual decoding parses an atom using float_of_string.

val string_bytes : string t

string_bytes converts OCaml strings as byte sequences. Textual conversion represents the bytes of s with the s-expression (hex [s]_hex) with [s]_hex the atom resulting from String.Ascii.to_hex s. See also atom and only_string.

Warning. A large string encoded on a 64-bit platform may fail to decode on a 32-bit platform.

val atom : string t

atom converts strings assumed to represent UTF-8 encoded Unicode text; but the encoding is not checked. Textual conversions represent strings as atoms. See also string_bytes and only_string.

Warning. A large atom encoded on a 64-bit platform may fail to decode on a 32-bit platform.

val atom_non_empty : string t: atom_non_empty is like atom but ensures the atom is not empty.

val option : ?⁠kind:string -> ?⁠docvar:string -> 'a t -> 'a option t: option c converts optional values converted with c. Textual conversions represent None with the atom none and Some v with the s-expression (some [v]_c).

val some : 'a t -> 'a option t: some c wraps decodes of c with Option.some. Warning. None can't be converted in either direction, use option for this.

val result : ?⁠kind:string -> ?⁠docvar:string -> 'a t -> 'b t -> ('a, 'b) Stdlib.result t: result ok error converts result values with ok and error. Textual conversions represent Ok v with the s-expression (ok [v]_ok) and Error e with (error [e]_error).

val list : ?⁠kind:string -> ?⁠docvar:string -> 'a t -> 'a list t

array c converts a list of values converted with c. Textual conversions represent a list [v0; ... vn] by the s-expression ([v0]_c ... [vn]_c).

Warning. A large list encoded on a 64-bit platform may fail to decode on a 32-bit platform.

val array : ?⁠kind:string -> ?⁠docvar:string -> 'a t -> 'a array t

array c is like list but converts arrays.

Warning. A large array encoded on a 64-bit platform may fail to decode on a 32-bit platform.

val pair : ?⁠kind:string -> ?⁠docvar:string -> 'a t -> 'b t -> ('a * 'b) t: pair c0 c1 converts pairs of values converted with c0 and c1. Textual conversion represent a pair (v0, v1) by the s-expression ([v0]_c0 [v1]_c1).

val enum : kind:string -> docvar:string -> ?⁠eq:('a -> 'a -> bool) -> (string * 'a) list -> 'a t: enum ~kind ~docvar ~eq vs converts values present in vs. eq is used to test equality among values (defaults to ( = )). The list length should not exceed 256. Textual conversions use the strings of the pairs in vs as atoms to encode the corresponding value.

Non-composable predefined converters

Textual conversions performed by the following converters cannot be composed; they do not respect the syntax of s-expression atoms. They can be used for direct conversions when one does not want to be subject to the syntactic constraints of s-expressions. For example when parsing command line interface arguments or environment variables.

val string_only : string t

string_only converts OCaml strings. Textual conversion is not composable, use string_bytes or atom instead. Textual encoding passes the string as is and decoding ignores the initial starting point and returns the whole input string.

Warning. A large string encoded on a 64-bit platform may fail to decode on a 32-bit platform.

S-expressions syntax

S-expressions are a general way of describing data via atoms (sequences of characters) and lists delimited by parentheses. Here are a few examples of s-expressions and their syntax:

this-is-an-atom
(this is a list of seven atoms)
(this list contains (a nested) list)

; This is a comment
; Anything that follows a semi-colon is ignored until the next line

(this list ; has three atoms and an embededded ()
 comment)

"this is a quoted atom, it can contain spaces ; and ()"

"quoted atoms can be split ^
 across lines or contain Unicode esc^u\{0061\}pes"

We define the syntax of s-expressions over a sequence of Unicode characters in which all US-ASCII control characters except whitespace are forbidden in unescaped form.

Note. This module assumes the sequence of Unicode characters is encoded as UTF-8 although it doesn't check this for now.

S-expressions and sequences thereof

An s-expression is either an atom or a list of s-expressions interspaced with whitespace and comments. A sequence of s-expressions is a succession of s-expressions interspaced with whitespace and comments.

These elements are informally described below and finally made precise via an ABNF grammar.

Whitespace

Whitespace is a sequence of whitespace characters, namely, space ' ' (U+0020), tab '\t' (U+0009), line feed '\n' (U+000A), vertical tab '\t' (U+000B), form feed (U+000C) and carriage return '\r' (U+000D).

Comments

Unless it occurs inside an atom in quoted form (see below) anything that follows a semicolon ';' (U+003B) is ignored until the next end of line, that is either a line feed '\n' (U+000A), a carriage return '\r' (U+000D) or a carriage return and a line feed "\r\n" (<U+000D,U+000A>).

(this is not a comment) ; This is a comment
(this is not a comment)

Atoms

An atom represents ground data as a string of Unicode characters. It can, via escapes, represent any sequence of Unicode characters, including control characters and U+0000. It cannot represent an arbitrary byte sequence except via a client-defined encoding convention (e.g. Base64 or hex encoding).

Atoms can be specified either via an unquoted or a quoted form. In unquoted form the atom is written without delimiters. In quoted form the atom is delimited by double quote '\"' (U+0022) characters, it is mandatory for atoms that contain whitespace, parentheses '(' ')', semicolons ';', quotes '\"', carets '^' or characters that need to be escaped.

abc        ; a token for the atom "abc"
"abc"      ; a quoted token for the atom "abc"
"abc; (d"  ; a quoted token for the atom "abc; (d"
""         ; the quoted token for the atom ""

For atoms that do not need to be quoted, both their unquoted and quoted form represent the same string; e.g. the string "true" can be represented both by the atoms true and "true". The empty string can only be represented in quoted form by "".

In quoted form escapes are introduced by a caret '^'. Double quotes '\"' and carets '^' must always be escaped.

"^^"             ; atom for ^
"^n"             ; atom for line feed U+000A
"^u\{0000\}"       ; atom for U+0000
"^"^u\{1F42B\}^""  ; atom with a quote, U+1F42B and a quote

The following escape sequences are recognized:

"^ " (<U+005E,U+0020>) for space ' ' (U+0020)
"^\"" (<U+005E,U+0022>) for double quote '\"' (U+0022) mandatory
"^^" (<U+005E,U+005E>) for caret '^' (U+005E) mandatory
"^n" (<U+005E,U+006E>) for line feed '\n' (U+000A)
"^r" (<U+005E,U+0072>) for carriage return '\r' (U+000D)
"^u{X}" with X is from 1 to at most 6 upper or lower case hexadecimal digits standing for the corresponding Unicode character U+X.
Any other character except line feed '\n' (U+000A) or carriage return '\r' (U+000D), following a caret is an illegal sequence of characters. In the two former cases the atom continues on the next line and white space is ignored.

An atom in quoted form can be split across lines by using a caret '^' (U+005E) followed by a line feed '\n' (U+000A) or a carriage return '\r' (U+000D); any subsequent whitespace is ignored.

"^
  a^
  ^ " ; the atom "a "

The character '^' (U+005E) is used as an escape character rather than the usual '\\' (U+005C) in order to make quoted Windows® file paths decently readable and, not the least, utterly please DKM.

Lists

Lists are delimited by left '(' (U+0028) and right ')' (U+0029) parentheses. Their elements are s-expressions separated by optional whitespace and comments. For example:

(a list (of four) expressions)
(a list(of four)expressions)
("a"list("of"four)expressions)
(a list (of ; This is a comment
four) expressions)
() ; the empty list

S-expression grammar

The following RFC 5234 ABNF grammar is defined on a sequence of Unicode characters.

 sexp-seq = *(ws / comment / sexp)
     sexp = atom / list
     list = %x0028 sexp-seq %x0029
     atom = token / qtoken
    token = t-char *(t-char)
   qtoken = %x0022 *(q-char / escape / cont) %x0022
   escape = %x005E (%x0020 / %x0022 / %x005E / %x006E / %x0072 /
                    %x0075 %x007B unum %x007D)
     unum = 1*6(HEXDIG)
     cont = %x005E nl ws
       ws = *(ws-char)
  comment = %x003B *(c-char) nl
       nl = %x000A / %x000D / %x000D %x000A
   t-char = %x0021 / %x0023-0027 / %x002A-%x003A / %x003C-%x005D /
            %x005F-%x007E / %x0080-D7FF / %xE000-10FFFF
   q-char = t-char / ws-char / %x0028 / %x0029 / %x003B
  ws-char = %x0020 / %x0009 / %x000A / %x000B / %x000C / %x000D
   c-char = %x0009 / %x000B / %x000C / %x0020-D7FF / %xE000-10FFFF

A few additional constraints not expressed by the grammar:

unum once interpreted as an hexadecimal number must be a Unicode scalar value.
A comment can be ended by the end of the character sequence rather than nl.