Module B0_std.Conv
Value converters.
A value converter describes how to encode and decode OCaml values to a binary presentation and a textual, human specifiable, s-expression based, representation.
Notation. Given a value v and a converter c we write [v]c the textual encoding of v according to c.
Low-level encoders and decoders
exceptionError of int * int * stringThe exception for conversion errors. This exception is raised both by encoders and decoders with
raise_notrace. The integers indicates a byte index range in the input on decoding errors, it is meaningless on encoding ones.Note. This exception is used for defining converters. High-level converting functions do not raise but use result values to report errors.
module Bin : sig ... endBinary codecs.
module Txt : sig ... endTextual codecs
Converters
val v : kind:string -> docvar:string -> 'a Bin.enc -> 'a Bin.dec -> 'a Txt.enc -> 'a Txt.dec -> 'a tv ~kind ~docvar bin_enc bin_dec txt_enc txt_decis a value converter usingbin_enc,bin_dec,txt_enc,txt_decfor binary and textual conversions.kinddocuments the kind of converted value anddocvara meta-variable used in documentation to stand for these values (use uppercase e.g.INTfor integers).
val kind : 'a t -> stringkind cis the documented kind of value converted byc.
val docvar : 'a t -> stringdocvar cis the documentation meta-variable for values converted byc.
val with_kind : ?docvar:string -> string -> 'a t -> 'a twith_kind ~docvar k ciscwith kindkand documentation meta-variabledocvar(defaults todocvar c).
val with_docvar : string -> 'a t -> 'a twith_docvar docvar ciscwith documentation meta-variabledocvar.
val with_conv : kind:string -> docvar:string -> ('b -> 'a) -> ('a -> 'b) -> 'a t -> 'b twith_conv ~kind ~docvar to_t of_t t_convis a converter for type'bgiven a convertert_convfor type'aand conversion functions from and to type'b. The conversion functions should raiseErrorif they are not total.
Converting
val to_bin : ?buf:Stdlib.Buffer.t -> 'a t -> 'a -> (string, string) Stdlib.resultto_bin c vbinary encodesvusingc.bufis used as the internal buffer if specified (it isBuffer.cleared before usage).
val of_bin : 'a t -> string -> ('a, string) Stdlib.resultof_bin c sbinary decodes a value fromsusingc.
val to_txt : ?buf:Stdlib.Buffer.t -> 'a t -> 'a -> (string, string) Stdlib.resultto_txt c vtextually encodesvusingc.bufis used as the internal buffer if specified (it isBuffer.cleared before usage).
val of_txt : 'a t -> string -> ('a, string) Stdlib.resultof_txt c stextually decodes a value fromsusingc.
Predefined converters
val bool : bool tboolconverts booleans. Textual conversions represent booleans with the atoms true and false.
val byte : int tbyteconverts a byte. Textual decoding parses an atom according to the syntax ofint_of_string. Conversions fail if the integer is not in the range [0;255].
val int : int tintconverts signed OCaml integers. Textual decoding parses an atom according to the syntax ofint_of_string. Conversions fail if the integer is not in the range [-2Sys.int_size-1;2Sys.int_size-1-1].Warning. A large integer encoded on a 64-bit platform may fail to decode on a 32-bit platform, use
int31orint64if this is a problem.
val int31 : int tint31converts signed 31-bit integers. Textual decoding parses an atom according to the syntax ofint_of_string. Conversions fail if the integer is not in the range [-230;230-1].
val int32 : int32 tint32converts signed 32-bit integers. Textual decoding parses an atom according to the syntax ofInt32.of_string. Conversions fail if the integer is not in the range [-231;231-1].
val int64 : int64 tint64converts signed 64-bit integers. Textual decoding parses an atom according to the syntax ofInt64.of_string. Conversions fail if the integer is not in the range [-263;263-1].
val float : float tfloatconverts floating point numbers. Textual decoding parses an atom usingfloat_of_string.
val string_bytes : string tstring_bytesconverts OCaml strings as byte sequences. Textual conversion represents the bytes ofswith the s-expression (hex [s]hex) with [s]hex the atom resulting fromString.Ascii.to_hexs. See alsoatomandonly_string.Warning. A large string encoded on a 64-bit platform may fail to decode on a 32-bit platform.
val atom : string tatomconverts strings assumed to represent UTF-8 encoded Unicode text; but the encoding is not checked. Textual conversions represent strings as atoms. See alsostring_bytesandonly_string.Warning. A large atom encoded on a 64-bit platform may fail to decode on a 32-bit platform.
val option : ?kind:string -> ?docvar:string -> 'a t -> 'a option toption cconverts optional values converted withc. Textual conversions representNonewith the atom none andSome vwith the s-expression (some [v]c).
val some : 'a t -> 'a option tsome cwraps decodes ofcwithOption.some. Warning.Nonecan't be converted in either direction, useoptionfor this.
val result : ?kind:string -> ?docvar:string -> 'a t -> 'b t -> ('a, 'b) Stdlib.result tresult ok errorconverts result values withokanderror. Textual conversions representOk vwith the s-expression (ok [v]ok) andError ewith (error [e]error).
val list : ?kind:string -> ?docvar:string -> 'a t -> 'a list tarray cconverts a list of values converted withc. Textual conversions represent a list[v0; ... vn]by the s-expression ([v0]c ... [vn]c).Warning. A large list encoded on a 64-bit platform may fail to decode on a 32-bit platform.
val array : ?kind:string -> ?docvar:string -> 'a t -> 'a array tarray cis likelistbut converts arrays.Warning. A large array encoded on a 64-bit platform may fail to decode on a 32-bit platform.
val pair : ?kind:string -> ?docvar:string -> 'a t -> 'b t -> ('a * 'b) tpair c0 c1converts pairs of values converted withc0andc1. Textual conversion represent a pair(v0, v1)by the s-expression ([v0]c0 [v1]c1).
val enum : kind:string -> docvar:string -> ?eq:('a -> 'a -> bool) -> (string * 'a) list -> 'a tenum ~kind ~docvar ~eq vsconverts values present invs.eqis used to test equality among values (defaults to( = )). The list length should not exceed 256. Textual conversions use the strings of the pairs invsas atoms to encode the corresponding value.
Non-composable predefined converters
Textual conversions performed by the following converters cannot be composed; they do not respect the syntax of s-expression atoms. They can be used for direct conversions when one does not want to be subject to the syntactic constraints of s-expressions. For example when parsing command line interface arguments or environment variables.
val string_only : string tstring_onlyconverts OCaml strings. Textual conversion is not composable, usestring_bytesoratominstead. Textual encoding passes the string as is and decoding ignores the initial starting point and returns the whole input string.Warning. A large string encoded on a 64-bit platform may fail to decode on a 32-bit platform.
S-expressions syntax
S-expressions are a general way of describing data via atoms (sequences of characters) and lists delimited by parentheses. Here are a few examples of s-expressions and their syntax:
this-is-an-atom
(this is a list of seven atoms)
(this list contains (a nested) list)
; This is a comment
; Anything that follows a semi-colon is ignored until the next line
(this list ; has three atoms and an embededded ()
comment)
"this is a quoted atom, it can contain spaces ; and ()"
"quoted atoms can be split ^
across lines or contain Unicode esc^u\{0061\}pes"We define the syntax of s-expressions over a sequence of Unicode characters in which all US-ASCII control characters except whitespace are forbidden in unescaped form.
Note. This module assumes the sequence of Unicode characters is encoded as UTF-8 although it doesn't check this for now.
S-expressions and sequences thereof
An s-expression is either an atom or a list of s-expressions interspaced with whitespace and comments. A sequence of s-expressions is a succession of s-expressions interspaced with whitespace and comments.
These elements are informally described below and finally made precise via an ABNF grammar.
Whitespace
Whitespace is a sequence of whitespace characters, namely, space ' ' (U+0020), tab '\t' (U+0009), line feed '\n' (U+000A), vertical tab '\t' (U+000B), form feed (U+000C) and carriage return '\r' (U+000D).
Comments
Unless it occurs inside an atom in quoted form (see below) anything that follows a semicolon ';' (U+003B) is ignored until the next end of line, that is either a line feed '\n' (U+000A), a carriage return '\r' (U+000D) or a carriage return and a line feed "\r\n" (<U+000D,U+000A>).
(this is not a comment) ; This is a comment (this is not a comment)
Atoms
An atom represents ground data as a string of Unicode characters. It can, via escapes, represent any sequence of Unicode characters, including control characters and U+0000. It cannot represent an arbitrary byte sequence except via a client-defined encoding convention (e.g. Base64 or hex encoding).
Atoms can be specified either via an unquoted or a quoted form. In unquoted form the atom is written without delimiters. In quoted form the atom is delimited by double quote '\"' (U+0022) characters, it is mandatory for atoms that contain whitespace, parentheses '(' ')', semicolons ';', quotes '\"', carets '^' or characters that need to be escaped.
abc ; a token for the atom "abc" "abc" ; a quoted token for the atom "abc" "abc; (d" ; a quoted token for the atom "abc; (d" "" ; the quoted token for the atom ""
For atoms that do not need to be quoted, both their unquoted and quoted form represent the same string; e.g. the string "true" can be represented both by the atoms true and "true". The empty string can only be represented in quoted form by "".
In quoted form escapes are introduced by a caret '^'. Double quotes '\"' and carets '^' must always be escaped.
"^^" ; atom for ^
"^n" ; atom for line feed U+000A
"^u\{0000\}" ; atom for U+0000
"^"^u\{1F42B\}^"" ; atom with a quote, U+1F42B and a quoteThe following escape sequences are recognized:
"^ "(<U+005E,U+0020>) for space' '(U+0020)"^\""(<U+005E,U+0022>) for double quote'\"'(U+0022) mandatory"^^"(<U+005E,U+005E>) for caret'^'(U+005E) mandatory"^n"(<U+005E,U+006E>) for line feed'\n'(U+000A)"^r"(<U+005E,U+0072>) for carriage return'\r'(U+000D)"^u{X}"withXis from 1 to at most 6 upper or lower case hexadecimal digits standing for the corresponding Unicode character U+X.- Any other character except line feed
'\n'(U+000A) or carriage return'\r'(U+000D), following a caret is an illegal sequence of characters. In the two former cases the atom continues on the next line and white space is ignored.
An atom in quoted form can be split across lines by using a caret '^' (U+005E) followed by a line feed '\n' (U+000A) or a carriage return '\r' (U+000D); any subsequent whitespace is ignored.
"^ a^ ^ " ; the atom "a "
The character '^' (U+005E) is used as an escape character rather than the usual '\\' (U+005C) in order to make quoted Windows® file paths decently readable and, not the least, utterly please DKM.
Lists
Lists are delimited by left '(' (U+0028) and right ')' (U+0029) parentheses. Their elements are s-expressions separated by optional whitespace and comments. For example:
(a list (of four) expressions)
(a list(of four)expressions)
("a"list("of"four)expressions)
(a list (of ; This is a comment
four) expressions)
() ; the empty listS-expression grammar
The following RFC 5234 ABNF grammar is defined on a sequence of Unicode characters.
sexp-seq = *(ws / comment / sexp)
sexp = atom / list
list = %x0028 sexp-seq %x0029
atom = token / qtoken
token = t-char *(t-char)
qtoken = %x0022 *(q-char / escape / cont) %x0022
escape = %x005E (%x0020 / %x0022 / %x005E / %x006E / %x0072 /
%x0075 %x007B unum %x007D)
unum = 1*6(HEXDIG)
cont = %x005E nl ws
ws = *(ws-char)
comment = %x003B *(c-char) nl
nl = %x000A / %x000D / %x000D %x000A
t-char = %x0021 / %x0023-0027 / %x002A-%x003A / %x003C-%x005D /
%x005F-%x007E / %x0080-D7FF / %xE000-10FFFF
q-char = t-char / ws-char / %x0028 / %x0029 / %x003B
ws-char = %x0020 / %x0009 / %x000A / %x000B / %x000C / %x000D
c-char = %x0009 / %x000B / %x000C / %x0020-D7FF / %xE000-10FFFFA few additional constraints not expressed by the grammar:
unumonce interpreted as an hexadecimal number must be a Unicode scalar value.- A comment can be ended by the end of the character sequence rather than
nl.