Module String.Sub
Substrings.
A substring defines a possibly empty subsequence of bytes in a base string.
The positions of a string s of length l are the slits found before each byte and after the last byte of the string. They are labelled from left to right by increasing number in the range [0;l].
positions 0 1 2 3 4 l-1 l
+---+---+---+---+ +-----+
indices | 0 | 1 | 2 | 3 | ... | l-1 |
+---+---+---+---+ +-----+The ith byte index is between positions i and i+1.
Formally we define a substring of s as being a subsequence of bytes defined by a start and a stop position. The former is always smaller or equal to the latter. When both positions are equal the substring is empty. Note that for a given base string there are as many empty substrings as there are positions in the string.
Like in strings, we index the bytes of a substring using zero-based indices.
See how to use substrings to parse data.
Substrings
type t= subThe type for substrings.
val empty : subemptyis the empty substring of the empty stringString.empty.
val v : ?start:int -> ?stop:int -> string -> subv ~start ~stop sis the substring ofsthat starts at positionstart(defaults to0) and stops at positionstop(defaults toString.length s).- raises Invalid_argument
if
startorstopare not positions ofsor ifstop < start.
val start_pos : sub -> intstart_pos siss's start position in the base string.
val stop_pos : sub -> intstop_pos siss's stop position in the base string.
val base_string : sub -> stringbase_string siss's base string.
val length : sub -> intlength sis the number of bytes ins.
val get : sub -> int -> charget s iis the byte ofsat its zero-based indexi.- raises Invalid_argument
if
iis not an index ofs.
val get_byte : sub -> int -> intget_byte s iisChar.to_int (get s i).
val head : ?rev:bool -> sub -> char optionhead sisSome (get s h)withh = 0ifrev = false(default) orh = length s - 1ifrev = true.Noneis returned ifsis empty.
val get_head : ?rev:bool -> sub -> charget_head sis likeheadbut- raises Invalid_argument
if
sis empty.
val of_string : string -> subof_string sisv s
val to_string : sub -> stringto_string sis the bytes ofsas a string.
val rebase : sub -> subrebase sisv (to_string s). This putsson a base string made solely of its bytes.
val hash : sub -> inthash sisHashtbl.hash s.
Stretching substrings
See the graphical guide.
val tail : ?rev:bool -> sub -> subtail sisswithout its first (revisfalse, default) or last (revistrue) byte orsif it is empty.
val extend : ?rev:bool -> ?max:int -> ?sat:(char -> bool) -> sub -> subextend ~rev ~max ~sat sextendssby at mostmaxconsecutivesatsatisfiying bytes of the base string located afterstop s(revisfalse, default) or beforestart s(revistrue). Ifmaxis unspecified the extension is limited by the extents of the base string ofs.satdefaults tofun _ -> true.- raises Invalid_argument
if
maxis negative.
val reduce : ?rev:bool -> ?max:int -> ?sat:(char -> bool) -> sub -> subreduce ~rev ~max ~sat sreducessby at mostmaxconsecutivesatsatisfying bytes ofslocated beforestop s(revisfalse, default) or afterstart s(revistrue). Ifmaxis unspecified the reduction is limited by the extents of the substrings.satdefaults tofun _ -> true.- raises Invalid_argument
if
maxis negative.
val extent : sub -> sub -> subextent s s'is the smallest substring that includes all the positions ofsands'.- raises Invalid_argument
if
sands'are not on the same base string according to physical equality.
val overlap : sub -> sub -> sub optionoverlap s s'is the smallest substring that includes all the positions common tosands'orNoneif there are no such positions. Note that the overlap substring may be empty.- raises Invalid_argument
if
sands'are not on the same base string according to physical equality.
Appending substrings
val append : sub -> sub -> subappend s s'is like Appending strings. The substrings can be on different bases and the result is on a base string that holds exactly the appended bytes.
val concat : ?sep:sub -> sub list -> subconcat ~sep ssis likeString.concat. The substrings can all be on different bases and the result is on a base string that holds exactly the concatenated bytes.
Predicates
val is_empty : sub -> boolis_empty sislength s = 0.
val is_prefix : affix:sub -> sub -> boolis_prefixis likeString.is_prefix. Only bytes are compared,affixcan be on a different base string.
val is_infix : affix:sub -> sub -> boolis_infixis likeString.is_infix. Only bytes are compared,affixcan be on a different base string.
val is_suffix : affix:sub -> sub -> boolis_suffixis likeString.is_suffix. Only bytes are compared,affixcan be on a different base string.
val for_all : (char -> bool) -> sub -> boolfor_allis likeString.for_allon the substring.
val exists : (char -> bool) -> sub -> boolexistsis likeString.existson the substring.
val same_base : sub -> sub -> boolsame_base s s'istrueiff the substringssands'have the same base string according to physical equality.
val equal_bytes : sub -> sub -> boolequal_bytes s s'istrueiff the substringssands'have exactly the same bytes. The substrings can be on a different base string.
val compare_bytes : sub -> sub -> intcompare_bytes s s'compares the bytes ofsands' in lexicographical order. The substrings can be on a different base string.
Extracting substrings
Extracted substrings are always on the same base string as the substring s acted upon.
val with_range : ?first:int -> ?len:int -> sub -> subwith_rangeis likeString.sub_with_range. The indices are the substring's zero-based ones, not those in the base string.
val with_index_range : ?first:int -> ?last:int -> sub -> subwith_index_rangeis likeString.sub_with_index_range. The indices are the substring's zero-based ones, not those in the base string.
val trim : ?drop:(char -> bool) -> sub -> subtrimis likeString.trim. If all bytes are dropped returns an empty string located in the middle of the argument.
val span : ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> sub -> sub * subspanis likeString.span. For a substringsa left empty span isstart sand a right empty span isstop s.
val take : ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> sub -> subtakeis likeString.take.
val drop : ?rev:bool -> ?min:int -> ?max:int -> ?sat:(char -> bool) -> sub -> subdropis likeString.drop.
val cut : ?rev:bool -> sep:sub -> sub -> (sub * sub) optioncutis likeString.cut.sepcan be on a different base string
val cuts : ?rev:bool -> ?empty:bool -> sep:sub -> sub -> sub listcutsis likeString.cuts.sepcan be on a different base string
val fields : ?empty:bool -> ?is_sep:(char -> bool) -> sub -> sub listfieldsis likeString.fields.
Traversing substrings
val find : ?rev:bool -> (char -> bool) -> sub -> sub optionfind ~rev sat sis the substring ofs(if any) that spans the first byte that satisfiessatinsafter positionstart s(revisfalse, default) or beforestop s(revistrue).Noneis returned if there is no matching byte ins.
val find_sub : ?rev:bool -> sub:sub -> sub -> sub optionfind_sub ~rev ~sub sis the substring ofs(if any) that spans the first match ofsubinsafter positionstart s(revisfalse, defaults) or beforestop s(revisfalse). Only bytes are compared andsubcan be on a different base string.Noneis returned if there is no match ofsubins.
val filter : (char -> bool) -> sub -> subfilter sat sis likeString.filter. The result is on a base string that holds only the filtered bytes.
val filter_map : (char -> char option) -> sub -> subfilter_map f sis likeString.filter_map. The result is on a base string that holds only the filtered bytes.
val map : (char -> char) -> sub -> submapis likeString.map. The result is on a base string that holds only the mapped bytes.
val mapi : (int -> char -> char) -> sub -> submapiis likeString.mapi. The result is on a base string that holds only the mapped bytes. The indices are the substring's zero-based ones, not those in the base string.
val fold_left : ('a -> char -> 'a) -> 'a -> sub -> 'afold_leftis likeString.fold_left.
val fold_right : (char -> 'a -> 'a) -> sub -> 'a -> 'afold_rightis likeString.fold_right.
val iter : (char -> unit) -> sub -> unititeris likeString.iter.
val iteri : (int -> char -> unit) -> sub -> unititeriis likeString.iteri. The indices are the substring's zero-based ones, not those in the base string.
Pretty printing
val pp : Stdlib.Format.formatter -> sub -> unitpp ppf sprintss's bytes onppf.
val dump : Stdlib.Format.formatter -> sub -> unitdump ppf sprintssas a syntactically valid OCaml string onppfusingAscii.escape_string.
val dump_raw : Stdlib.Format.formatter -> sub -> unitdump_raw ppf sprints an unspecified raw internal representation ofson ppf.
OCaml base type conversions
val of_char : char -> subof_char cis a string that contains the bytec.
val to_char : sub -> char optionto_char sis the single byte insorNoneif there is no byte or more than one ins.
val of_bool : bool -> subof_bool bis a string representation forb. Relies onPervasives.string_of_bool.
val to_bool : sub -> bool optionto_bool sis aboolfroms, if any. Relies onPervasives.bool_of_string.
val of_int : int -> subof_int iis a string representation fori. Relies onPervasives.string_of_int.
val to_int : sub -> int optionto_intis anintfroms, if any. Relies onPervasives.int_of_string.
val of_nativeint : nativeint -> subof_nativeint iis a string representation fori. Relies onNativeint.of_string.
val to_nativeint : sub -> nativeint optionto_nativeintis annativeintfroms, if any. Relies onNativeint.to_string.
val of_int32 : int32 -> subof_int32 iis a string representation fori. Relies onInt32.of_string.
val to_int32 : sub -> int32 optionto_int32is anint32froms, if any. Relies onInt32.to_string.
val of_int64 : int64 -> subof_int64 iis a string representation fori. Relies onInt64.of_string.
val to_int64 : sub -> int64 optionto_int64is anint64froms, if any. Relies onInt64.to_string.
val of_float : float -> subof_float fis a string representation forf. Relies onPervasives.string_of_float.
val to_float : sub -> float optionto_float sis afloatfroms, if any. Relies onPervasives.float_of_string.
Substring stretching graphical guide
+---+---+---+---+---+---+---+---+---+---+---+
| R | e | v | o | l | t | | n | o | w | ! |
+---+---+---+---+---+---+---+---+---+---+---+
|---------------| a
| start a
| stop a
|-----------| tail a
|-----------| tail ~rev:true a
|-----------------------------------| extend a
|-----------------------| extend ~rev:true a
|-------------------------------------------| base a
|-----------| b
| start b
| stop b
|-------| tail b
|-------| tail ~rev:true b
|-------------------------------------------| extend b
|-----------| extend ~rev:true b
|-------------------------------------------| base b
|-----------------------| extent a b
|---| overlap a b
| c
| start c
| stop c
| tail c
| tail ~rev:true c
|---------------| extend c
|---------------------------| extend ~rev:true c
|-------------------------------------------| base c
|-------------------| extent a c
None overlap a c
|---------------| d
| start d
| stop d
|-----------| tail d
|-----------| tail ~rev:true d
|---------------| extend d
|-------------------------------------------| extend ~rev:true d
|-------------------------------------------| base d
|---------------| extent d c
| overlap d c