Cstruct_capRaw memory buffers with capabilities
Cstruct_cap wraps OCaml Stdlib's Bigarray module. Each t consists of a proxy (consisting of offset, length, and the actual Bigarray.t buffer). The goal of this module is two-fold: enable zero-copy - the underlying buffer is shared by most of the functions - and static checking of read and write capabilities to the underlying buffer (using phantom types).
Each 'a t is parameterized by the available capabilities: read (rd) and write (wr): to access the contents of the buffer the read capability is necessary, for modifying the content of the buffer the write capability is necessary. Capabilities can only be dropped, never gained, to a buffer. If code only has read capability, this does not mean that there is no other code fragment with write capability to the underlying buffer.
The functions that retrieve bytes (get_uint8 etc.) require a read capability, functions mutating the underlying buffer (set_uint8 etc.) require a write capability. Allocation of a buffer (via create, ...) returns a t with read and write capabilities. ro drops the write capability, wo drops the read capability. The only exception is unsafe_to_bigarray that returns the underlying Bigarray.t.
Accessors and mutators for fixed size integers (8, 16, 32, 64 bit) are provided for big-endian and little-endian encodings.
type buffer = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.tType of buffer. A t is composed of an underlying buffer.
equal a b is true iff a and b correspond to the same sequence of bytes (it uses memcmp internally). Both a and b need at least read capability rd.
val pp : Format.formatter -> 'a rd t -> unitpp ppf t pretty-prints t on ppf. t needs read capability rd.
val length : 'a t -> intval check_alignment : 'a t -> int -> boolcheck_alignment t alignment is true if the first byte stored in the underlying buffer of t is at a memory address where address mod alignment = 0, false otherwise. The mod used has the C/OCaml semantic (which differs from Python). Typical uses are to check a buffer is aligned to a page or disk sector boundary.
create len allocates a buffer and proxy with both read and write capabilities of size len. It is filled with zero bytes.
create_unsafe len allocates a buffer and proxy with both read and write capabilities of size len.
Note that the returned t will contain arbitrary data, likely including the contents of previously-deallocated cstructs.
Beware!
Forgetting to replace this data could cause your application to leak sensitive information.
sub t ~off ~len returns a proxy which shares the underlying buffer of t. It is sliced at offset off and of length len. The returned value has the same capabilities as t.
shift t len returns a proxy which shares the underlying buffer of t. The returned value starts len bytes later than the given t. The returned value has the same capabilities as t.
split ~start t len returns two proxies extracted from t. The first starts at offset start (default 0), and is of length len. The second is the remainder of t. The underlying buffer is shared, the capabilities are preserved.
append a b allocates a buffer r of size length a + length b. Then the content of a is copied at the start of the buffer r, and b is copied behind a's end in r. a and b need at least read capability rd, the returned value has both read and write capabilities.
concat vss allocates a buffer r of size lenv vss. Each v of vss is copied into the buffer r. Each v of vss need at least read capability rd, the returned value has both read and write capabilities.
fillv ~src ~dst copies from src to dst until src is exhausted or dst is full. It returns the number of bytes copied and the remaining data from src, if any. This is useful if you want to bufferize data into fixed-sized chunks. Each t of src need at least read capability rd. dst needs at least write capability wr.
rev t allocates a buffer r of size length t, and fills it with the bytes of t in reverse order. The given t needs at least read capability rd, the returned value has both read and write capabilities.
memset t x sets all bytes of t to x land 0xFF. t needs at least write capability wr.
blit src ~src_off dst ~dst_off ~len copies len bytes from src starting at index src_off to dst starting at index dst_off. It works correctly even if src and dst refer to the same underlying buffer, and the src and dst intervals overlap. This function uses memmove internally.
src needs at least read capability rd. dst needs at least write capability wr.
blit_from_string src ~src_off dst ~dst_off ~len copies len byres from src starting at index src_off to dst starting at index dst_off. This function uses memcpy internally.
dst needs at least write capability wr.
blit_from_bytes src ~src_off dst ~dst_off ~len copies len bytes from src starting at index src_off to dst starting at index dst_off. This uses memcpy internally.
dst needs at least write capability wr.
of_string ~off ~len s allocates a buffer and copies the contents of s into it starting at offset off (default 0) and of length len (default String.length s - off). The returned value has both read and write capabilities.
to_string ~off ~len t is the string representation of the segment of t starting at off (default 0) of size len (default length t - off). t needs at least read capability rd.
of_hex ~off ~len s allocates a buffer and copies the content of s starting at offset off (default 0) of length len (default String.length s - off), decoding the hex-encoded characters. Whitespaces in the string are ignored, every pair of hex-encoded characters in s are converted to one byte in the returned t, which is exactly half the size of the non-whitespace characters of s from off of length len.
of_bytes ~off ~len b allocates a buffer and copies the contents of b into it starting at offset off (default 0) and of length len (default Bytes.length b - off). The returned value has both read and write capabilities.
to_bytes ~off ~len t is the bytes representation of the segment of t starting at off (default 0) of size len (default length t - off). t needs at least read capability rd.
blit_to_bytes src ~src_off dst ~dst_off ~len copies length len bytes from src, starting at index src_off, to sequences dst, starting at index dst_off. blit_to_bytes uses memcpy internally.
src needs at least read capability rd.
of_bigarray ~off ~len b is a proxy that contains b with offset off (default 0) of length len (default Bigarray.Array1.dim b - off). The returned value has both read and write capabilties.
unsafe_to_bigarray t converts t into a buffer Bigarray, using the Bigarray slicing to allocate a fresh proxy Bigarray that preserves sharing of the underlying buffer.
In other words:
let t = Cstruct_cap.create 10 in
let b = Cstruct_cap.unsafe_to_bigarray t in
Bigarray.Array1.set b 0 '\x42' ;
assert (Cstruct_cap.get_char t 0 = '\x42')iter lenf of_cstruct t is an iterator over t that returns elements of size lenf t and type of_cstruct t. t needs at least read capability rd and iter keeps capabilities of t on of_cstruct.
val fold : ('acc -> 'x -> 'acc) -> 'x iter -> 'acc -> 'accfold f iter acc is (f iterN accN ... (f iter acc)...).
get_char t off returns the character contained in t at offset off. t needs at least read capability rd.
set_char t off c sets the character contained in t at offset off to character c. t needs at least write capability wr.
get_uint8 t off returns the byte contained in t at offset off. t needs at least read capability rd.
set_uint8 t off x sets the byte contained in t at offset off to byte x. t needs at least write capability wr.
module BE : sig ... endmodule LE : sig ... endAs Cstruct, capabilities interface provides helpers functions to help the user to parse contents.
head cs is Some (get cs h) with h = 0 if rev = false (default) or h
= length cs - 1 if rev = true. None is returned if cs is empty.
tail cs is cs without its first (rev is false, default) or last (rev is true) byte or cs is empty.
is_prefix ~affix cs is true iff affix.[zidx] = cs.[zidx] for all indices zidx of affix.
is_suffix ~affix cs is true iff affix.[n - zidx] = cs.[m - zidx] for all indices zidx of affix with n = length affix - 1 and m = length cs
- 1.
is_infix ~affix cs is true iff there exists an index z in cs such that for all indices zidx of affix we have affix.[zidx] = cs.[z +
zidx].
for_all p cs is true iff for all indices zidx of cs, p cs.[zidx] =
true.
exists p cs is true iff there exists an index zidx of cs with p
cs.[zidx] = true.
trim ~drop cs is cs with prefix and suffix bytes satisfying drop in cs removed. drop defaults to function ' ' | '\r' .. '\t' -> true | _ ->
false.
span ~rev ~min ~max ~sat cs is (l, r) where:
rev is false (default), l is at least min and at most max consecutive sat satisfying initial bytes of cs or empty if there are no such bytes. r are the remaining bytes of cs.rev is true, r is at least min and at most max consecutive sat satisfying final bytes of cs or empty if there are no such bytes. l are the remaining bytes of cs.If max is unspecified the span is unlimited. If min is unspecified it defaults to 0. If min > max the condition can't be satisfied and the left or right span, depending on rev, is always empty. sat defaults to (fun _ -> true).
The invariant l ^ r = s holds.
For instance, the ABNF expression:
time := 1*10DIGIT
can be translated to:
let (time, _) = span ~min:1 ~max:10 is_digit cs intake ~rev ~min ~max ~sat cs is the matching span of span without the remaining one. In other words:
(if rev then snd else fst) @@ span ~rev ~min ~max ~sat csdrop ~rev ~min ~max ~sat cs is the remaining span of span without the matching one. In other words:
(if rev then fst else snd) @@ span ~rev ~min ~max ~sat cscut ~sep cs is either the pair Some (l, r) of the two (possibly empty) sub-buffers of cs that are delimited by the first match of the non empty separator string sep or None if sep can't be matched in cs. Matching starts from the beginning of cs (rev is false, default) or the end (rev is true).
The invariant l ^ sep ^ r = s holds.
For instance, the ABNF expression:
field_name := *PRINT field_value := *ASCII field := field_name ":" field_value
can be translated to:
match cut ~sep:":" value with
| Some (field_name, field_value) -> ...
| None -> invalid_arg "invalid field"cuts ~sep cs is the list of all sub-buffers of cs that are delimited by matches of the non empty separator sep. Empty sub-buffers are omitted in the list if empty is false (default to true).
Matching separators in cs starts from the beginning of cs (rev is false, default) or the end (rev is true). Once one is found, the separator is skipped and matching starts again, that is separator matches can't overlap. If there is no separator match in cs, the list [cs] is returned.
The following invariants hold:
concat ~sep (cuts ~empty:true ~sep cs) = cscuts ~empty:true ~sep cs <> []For instance, the ABNF expression:
arg := *(ASCII / ",") ; any characters exclude ","
args := arg *("," arg)can be translated to:
let args = cuts ~sep:"," buffer infields ~empty ~is_sep cs is the list of (possibly empty) sub-buffers that are delimited by bytes for which is_sep is true. Empty sub-buffers are omitted in the list if empty is false (defaults to true). is_sep c if it's not define by the user is true iff c is an US-ASCII white space character, that is one of space ' ' (0x20), tab '\t' (0x09), newline '\n' (0x0a), vertical tab (0x0b), form feed (0x0c), carriage return '\r' (0x0d).
find ~rev sat cs is the sub-buffer of cs (if any) that spans the first byte that satisfies sat in cs after position start cs (rev is false, default) or before stop cs (rev is true). None is returned if there is no matching byte in s.
find_sub ~rev ~sub cs is the sub-buffer of cs (if any) that spans the first match of sub in cs after position start cs (rev is false, default) or before stop cs (rev is true). Only bytes are compared and sub can be on a different base buffer. None is returned if there is no match of sub in s.
filter sat cs is the buffer made of the bytes of cs that satisfy sat, in the same order.
filter_map f cs is the buffer made of the bytes of cs as mapped by f, in the same order.
map f cs is cs' with cs'.[i] = f cs.[i] for all indices i of cs. f is invoked in increasing index order.