Module String.Sub
Substrings.
A substring defines a possibly empty subsequence of bytes in a base string.
The positions of a string s
of length l
are the slits found before each byte and after the last byte of the string. They are labelled from left to right by increasing number in the range [0
;l
].
positions 0 1 2 3 4 l-1 l +---+---+---+---+ +-----+ indices | 0 | 1 | 2 | 3 | ... | l-1 | +---+---+---+---+ +-----+
The i
th byte index is between positions i
and i+1
.
Formally we define a substring of s
as being a subsequence of bytes defined by a start and a stop position. The former is always smaller or equal to the latter. When both positions are equal the substring is empty. Note that for a given base string there are as many empty substrings as there are positions in the string.
Like in strings, we index the bytes of a substring using zero-based indices.
See how to use substrings to parse data.
Substrings
type t
= sub
The type for substrings.
val empty : sub
empty
is the empty substring of the empty stringString.empty
.
val v : ?start:int -> ?stop:int -> string -> sub
v ~start ~stop s
is the substring ofs
that starts at positionstart
(defaults to0
) and stops at positionstop
(defaults toString.length s
).- raises Invalid_argument
if
start
orstop
are not positions ofs
or ifstop < start
.
val start_pos : sub -> int
start_pos s
iss
's start position in the base string.
val stop_pos : sub -> int
stop_pos s
iss
's stop position in the base string.
val base_string : sub -> string
base_string s
iss
's base string.
val length : sub -> int
length s
is the number of bytes ins
.
val get : sub -> int -> char
get s i
is the byte ofs
at its zero-based indexi
.- raises Invalid_argument
if
i
is not an index ofs
.
val get_byte : sub -> int -> int
get_byte s i
isChar.to_int (get s i)
.
val head : ?rev:bool -> sub -> char option
head s
isSome (get s h)
withh = 0
ifrev = false
(default) orh = length s - 1
ifrev = true
.None
is returned ifs
is empty.
val get_head : ?rev:bool -> sub -> char
get_head s
is likehead
but- raises Invalid_argument
if
s
is empty.
val of_string : string -> sub
of_string s
isv s
val to_string : sub -> string
to_string s
is the bytes ofs
as a string.
val rebase : sub -> sub
rebase s
isv (to_string s)
. This putss
on a base string made solely of its bytes.
val hash : sub -> int
hash s
isHashtbl
.hash s.
Stretching substrings
See the graphical guide.
val tail : ?rev:bool -> sub -> sub
tail s
iss
without its first (rev
isfalse
, default) or last (rev
istrue
) byte ors
if it is empty.
val extend : ?rev:bool -> ?max:int -> ?sat:(char -> bool) -> sub -> sub
extend ~rev ~max ~sat s
extendss
by at mostmax
consecutivesat
satisfiying bytes of the base string located afterstop s
(rev
isfalse
, default) or beforestart s
(rev
istrue
). Ifmax
is unspecified the extension is limited by the extents of the base string ofs
.sat
defaults tofun _ -> true
.- raises Invalid_argument
if
max
is negative.
val reduce : ?rev:bool -> ?max:int -> ?sat:(char -> bool) -> sub -> sub
reduce ~rev ~max ~sat s
reducess
by at mostmax
consecutivesat
satisfying bytes ofs
located beforestop s
(rev
isfalse
, default) or afterstart s
(rev
istrue
). Ifmax
is unspecified the reduction is limited by the extents of the substrings
.sat
defaults tofun _ -> true
.- raises Invalid_argument
if
max
is negative.
val extent : sub -> sub -> sub
extent s s'
is the smallest substring that includes all the positions ofs
ands'
.- raises Invalid_argument
if
s
ands'
are not on the same base string according to physical equality.
val overlap : sub -> sub -> sub option
overlap s s'
is the smallest substring that includes all the positions common tos
ands'
orNone
if there are no such positions. Note that the overlap substring may be empty.- raises Invalid_argument
if
s
ands'
are not on the same base string according to physical equality.
Appending substrings
val append : sub -> sub -> sub
append s s'
is like Appending strings. The substrings can be on different bases and the result is on a base string that holds exactly the appended bytes.
val concat : ?sep:sub -> sub list -> sub
concat ~sep ss
is likeString.concat
. The substrings can all be on different bases and the result is on a base string that holds exactly the concatenated bytes.
Predicates
val is_empty : sub -> bool
is_empty s
islength s = 0
.
val is_prefix : affix:sub -> sub -> bool
is_prefix
is likeString.is_prefix
. Only bytes are compared,affix
can be on a different base string.
val is_infix : affix:sub -> sub -> bool
is_infix
is likeString.is_infix
. Only bytes are compared,affix
can be on a different base string.
val is_suffix : affix:sub -> sub -> bool
is_suffix
is likeString.is_suffix
. Only bytes are compared,affix
can be on a different base string.
val for_all : (char -> bool) -> sub -> bool
for_all
is likeString.for_all
on the substring.
val exists : (char -> bool) -> sub -> bool
exists
is likeString.exists
on the substring.
val same_base : sub -> sub -> bool
same_base s s'
istrue
iff the substringss
ands'
have the same base string according to physical equality.
val equal_bytes : sub -> sub -> bool
equal_bytes s s'
istrue
iff the substringss
ands'
have exactly the same bytes. The substrings can be on a different base string.
val compare_bytes : sub -> sub -> int
compare_bytes s s'
compares the bytes ofs
ands
' in lexicographical order. The substrings can be on a different base string.
Extracting substrings
Extracted substrings are always on the same base string as the substring s
acted upon.
val with_range : ?first:int -> ?len:int -> sub -> sub
with_range
is likeString.sub_with_range
. The indices are the substring's zero-based ones, not those in the base string.
val with_index_range : ?first:int -> ?last:int -> sub -> sub
with_index_range
is likeString.sub_with_index_range
. The indices are the substring's zero-based ones, not those in the base string.
val trim : ?drop:(char -> bool) -> sub -> sub
trim
is likeString.trim
. If all bytes are dropped returns an empty string located in the middle of the argument.
val span : ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> sub -> sub * sub
span
is likeString.span
. For a substrings
a left empty span isstart s
and a right empty span isstop s
.
val take : ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> sub -> sub
take
is likeString.take
.
val drop : ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> sub -> sub
drop
is likeString.drop
.
val cut : ?rev:bool -> sep:sub -> sub -> (sub * sub) option
cut
is likeString.cut
.sep
can be on a different base string
val cuts : ?rev:bool -> ?empty:bool -> sep:sub -> sub -> sub list
cuts
is likeString.cuts
.sep
can be on a different base string
val fields : ?empty:bool -> ?is_sep:(char -> bool) -> sub -> sub list
fields
is likeString.fields
.
Traversing substrings
val find : ?rev:bool -> (char -> bool) -> sub -> sub option
find ~rev sat s
is the substring ofs
(if any) that spans the first byte that satisfiessat
ins
after positionstart s
(rev
isfalse
, default) or beforestop s
(rev
istrue
).None
is returned if there is no matching byte ins
.
val find_sub : ?rev:bool -> sub:sub -> sub -> sub option
find_sub ~rev ~sub s
is the substring ofs
(if any) that spans the first match ofsub
ins
after positionstart s
(rev
isfalse
, defaults) or beforestop s
(rev
isfalse
). Only bytes are compared andsub
can be on a different base string.None
is returned if there is no match ofsub
ins
.
val filter : (char -> bool) -> sub -> sub
filter sat s
is likeString.filter
. The result is on a base string that holds only the filtered bytes.
val filter_map : (char -> char option) -> sub -> sub
filter_map f s
is likeString.filter_map
. The result is on a base string that holds only the filtered bytes.
val map : (char -> char) -> sub -> sub
map
is likeString.map
. The result is on a base string that holds only the mapped bytes.
val mapi : (int -> char -> char) -> sub -> sub
mapi
is likeString.mapi
. The result is on a base string that holds only the mapped bytes. The indices are the substring's zero-based ones, not those in the base string.
val fold_left : ('a -> char -> 'a) -> 'a -> sub -> 'a
fold_left
is likeString.fold_left
.
val fold_right : (char -> 'a -> 'a) -> sub -> 'a -> 'a
fold_right
is likeString.fold_right
.
val iter : (char -> unit) -> sub -> unit
iter
is likeString.iter
.
val iteri : (int -> char -> unit) -> sub -> unit
iteri
is likeString.iteri
. The indices are the substring's zero-based ones, not those in the base string.
Pretty printing
val pp : Stdlib.Format.formatter -> sub -> unit
pp ppf s
printss
's bytes onppf
.
val dump : Stdlib.Format.formatter -> sub -> unit
dump ppf s
printss
as a syntactically valid OCaml string onppf
usingAscii.escape_string
.
val dump_raw : Stdlib.Format.formatter -> sub -> unit
dump_raw ppf s
prints an unspecified raw internal representation ofs
on ppf.
OCaml base type conversions
val of_char : char -> sub
of_char c
is a string that contains the bytec
.
val to_char : sub -> char option
to_char s
is the single byte ins
orNone
if there is no byte or more than one ins
.
val of_bool : bool -> sub
of_bool b
is a string representation forb
. Relies onPervasives
.string_of_bool.
val to_bool : sub -> bool option
to_bool s
is abool
froms
, if any. Relies onPervasives
.bool_of_string.
val of_int : int -> sub
of_int i
is a string representation fori
. Relies onPervasives
.string_of_int.
val to_int : sub -> int option
to_int
is anint
froms
, if any. Relies onPervasives
.int_of_string.
val of_nativeint : nativeint -> sub
of_nativeint i
is a string representation fori
. Relies onNativeint
.of_string.
val to_nativeint : sub -> nativeint option
to_nativeint
is annativeint
froms
, if any. Relies onNativeint
.to_string.
val of_int32 : int32 -> sub
of_int32 i
is a string representation fori
. Relies onInt32
.of_string.
val to_int32 : sub -> int32 option
to_int32
is anint32
froms
, if any. Relies onInt32
.to_string.
val of_int64 : int64 -> sub
of_int64 i
is a string representation fori
. Relies onInt64
.of_string.
val to_int64 : sub -> int64 option
to_int64
is anint64
froms
, if any. Relies onInt64
.to_string.
val of_float : float -> sub
of_float f
is a string representation forf
. Relies onPervasives
.string_of_float.
val to_float : sub -> float option
to_float s
is afloat
froms
, if any. Relies onPervasives
.float_of_string.
Substring stretching graphical guide
+---+---+---+---+---+---+---+---+---+---+---+ | R | e | v | o | l | t | | n | o | w | ! | +---+---+---+---+---+---+---+---+---+---+---+ |---------------| a | start a | stop a |-----------| tail a |-----------| tail ~rev:true a |-----------------------------------| extend a |-----------------------| extend ~rev:true a |-------------------------------------------| base a |-----------| b | start b | stop b |-------| tail b |-------| tail ~rev:true b |-------------------------------------------| extend b |-----------| extend ~rev:true b |-------------------------------------------| base b |-----------------------| extent a b |---| overlap a b | c | start c | stop c | tail c | tail ~rev:true c |---------------| extend c |---------------------------| extend ~rev:true c |-------------------------------------------| base c |-------------------| extent a c None overlap a c |---------------| d | start d | stop d |-----------| tail d |-----------| tail ~rev:true d |---------------| extend d |-------------------------------------------| extend ~rev:true d |-------------------------------------------| base d |---------------| extent d c | overlap d c